Skip to main content
Log in

A Comparison of a Subjective and Statistical Method for Establishing Score Comparability in an Organizational Culture Survey

  • Published:
Journal of Business and Psychology Aims and scope Submit manuscript

Abstract

Purpose

The purpose of this study was to compare the results of a subjective and a statistical method of detecting non-comparable items in 14 language-translated forms of a survey measuring organizational culture.

Design/Methodology/Approach

Data were obtained from a large multinational organization using a 60-item organizational culture survey. Each of 14 language-translated forms were administered to members of their respective language groups and compared to the original English (United States) form. Subjective reviews were conducted using fluent bilingual organizational members whom flagged items they did not believe were comparable. Statistical analyses using an item response theory (IRT) approach were used to detect problematic items from 14 samples with sizes ranging from 304 to 3,014. Detection patterns from these two approaches were compared.

Findings

The subjective approach identified far less items as problematic and did not agree with the statistical approach.

Implications

Our results suggest that the subjective approach as a pre-screening adaptation procedure has little added value over a careful translation/back-translation procedure.

Originality/Value

The use of language-adapted organizational surveys has become increasingly important to multinational organizations. In examining scores from such surveys the establishment of score comparability is essential. IRT analyses are often used to determine whether scores are comparable across language translations. It is also common for subjective reviews of item content to be utilized prior to statistical techniques to determine if the translated items are comparable to the original form. No research known to authors has compared modern statistical and subjective approaches to addressing these issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Ryan et al. (2000) utilized the Differential Functioning of Items and Tests (DFIT) framework. See Oshima and Morris (2008) for a clear and accessible summary of the DFIT approach.

  2. Throughout this manuscript, we refer to the LR approach as “the statistical” approach. However, as one reviewer noted there are many statistical approaches, and the choice of a statistical approach is to some extent subjective. We have done our best to consider past research evidence to choose a method that has seen great empirical support.

  3. It should be noted that these analyses represent initial translations of the DOCS and these language-translated forms have since undergone substantial revisions. For example, the proportion of items shown to have DIF in the most recent Japanese form was significantly smaller, z = 2.47, p < .05. Details of this analysis can be obtained by contacting the first author.

  4. As a measure of association we utilized Cramér’s φ, which is calculated \( \varphi = \sqrt {\frac{{\chi^{2} }}{n(r - 1)}} \), where n is the number of observations and r is the number of rows (see Cramér 1945).

References

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

    Google Scholar 

  • Baker, M., Olohan, M., & Perez, M. C. (2010). Text and context: Essays on translation and interpreting in honour of Ian Mason. Manchester, UK: Saint Jerome Publishing.

    Google Scholar 

  • Behrend, T. S., Thompson, L. F., Meade, A. W., Newton, D. A., & Grayson, M. S. (2008). Measurement invariance in careers research: Using IRT to study gender differences in medical students’ specialization decisions. Journal of Career Development, 35, 60–83.

    Article  Google Scholar 

  • Berdie, F. S. (1971). What test questions are likely to offend the general public. Journal of Educational Measurement, 8, 87–93.

    Article  Google Scholar 

  • Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071.

    Article  PubMed  Google Scholar 

  • Brown, R. T., Reynolds, C. R., & Whitaker, J. S. (1999). Bias in mental testing since “Bias in Mental Testing”. School Psychology Quarterly, 14, 208–238.

    Article  Google Scholar 

  • Budgell, G. R., Raju, N. S., & Quartetti, D. A. (1995). Analysis of differential item functioning in translated assessment instruments. Applied Psychological Measurement, 19, 309–321.

    Article  Google Scholar 

  • Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253–260.

    Article  Google Scholar 

  • Carter, N. T., & Zickar, M. J. (2011). A comparison of the LR and DFIT frameworks of differential functioning to the generalized graded unfolding model. Applied Psychological Measurement, 35, 623–642.

    Article  Google Scholar 

  • Clark, P. C., & LaHuis, D. M. (2011). An examination of power and type I errors for two differential item functioning indices using the graded response model. Organizational Research Methods. doi:10.1177/1094428111403815.

  • Cramér, H. (1945). Mathematical methods of statistics. Uppsala: Almqvist & Wiksells.

    Google Scholar 

  • De Ayala, R. J., Kim, S.-H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2, 243–276.

    Google Scholar 

  • Denison, D. R. (1984). Bringing corporate culture to the bottom line. Organizational Dynamics, 13, 4–22.

    Article  PubMed  Google Scholar 

  • Denison, D. R. (1990). Corporate culture and organizational effectiveness. New York: Wiley.

    Google Scholar 

  • Denison, D. R. (2000). Organizational culture: Can it be a key lever for driving organizational change? In C. L. Cooper, S. Cartwright, & P. C. Earley (Eds.), The international handbook of organizational culture and climate (pp. 347–372). New York: Wiley.

    Google Scholar 

  • Denison, D. R., Janovics, J., Young, J., & Cho, H. J. (2007). Diagnosing organizational cultures: Validating a model and method. Working paper, International Institute for Management Development, Lausanne, Switzerland.

  • Denison, D. R., & Mishra, A. K. (1995). Toward a theory of organizational culture and effectiveness. Organizational Science, 6, 204–223.

    Article  Google Scholar 

  • Denison, D. R., & Neale, W. (1996). Denison organizational culture survey. Ann Arbor: Aviat.

    Google Scholar 

  • Denison, D. R., & Neale, W. (2000). Denison organizational culture survey. Ann Arbor, MI: Denison Consulting.

    Google Scholar 

  • Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are central issues. Psychological Bulletin, 95, 134–135.

    Article  Google Scholar 

  • Drasgow, F., Levine, M. V., Tsien, S., Williams, B. A., & Mead, A. D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143–165.

    Article  Google Scholar 

  • Ellis, B. B., Becker, P., & Kimmel, H. D. (1993). An item response theory evaluation of an English version of the Trier Personality Inventory. Journal of Cross-Cultural Psychology, 23, 133–148.

    Article  Google Scholar 

  • Ellis, B. B., Minsel, B., & Becker, P. (1989). Evaluation of attitude survey translations: An investigation using item response theory. International Journal of Psychology, 24, 665–684.

    Google Scholar 

  • Fey, C. F., & Denison, D. R. (2003). Organizational culture and effectiveness: Can American theory be applied in Russia? Organizational Science, 14, 686–706.

    Article  Google Scholar 

  • Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38, 164–187.

    Article  Google Scholar 

  • Guion, R. M. (1977). Content validity: The source of my discontent. Applied Psychological Measurement, 1, 1–10.

    Article  Google Scholar 

  • Guion, R. M. (1980). On Trinitarian doctrines of validity. Professional Psychology, 11, 385–398.

    Article  Google Scholar 

  • Hambleton, R. K., & de Jong, J. H. A. L. (2003). Advances in translating and adapting educational and psychological tests. Language Testing, 20, 127–134.

    Article  Google Scholar 

  • Hambleton, R. K., & Kanjee, A. (1995). Increasing the validity of cross-cultural assessments: Use of improved methods for test adaptation. European Journal of Psychological Assessment, 11, 147–157.

    Article  Google Scholar 

  • Hambleton, R. K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be avoided and guidelines for improving test adaptation practices. Journal of Applied Testing Technology, 1, 1–12.

    Google Scholar 

  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

    Google Scholar 

  • Highhouse, S. (2008). Stubborn reliance on intuition and subjectivity in employee selection. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 333–342.

    Article  Google Scholar 

  • Hofstede, G. (1991). Empirical models of cultural differences. In N. Bleichrodt & P. J. D. Drenth (Eds.), Contemporary issues in cross-cultural psychology (pp. 4–20). Lisse: Swets & Zeitlinger.

    Google Scholar 

  • Huang, C. D., Church, A. T., & Katigbak, M. S. (1997). Identifying cultural differences in items and traits: Differential item functioning in the NEO personality inventory. Journal of Cross-Cultural Psychology, 28, 192–218.

    Article  Google Scholar 

  • Hulin, C. L., Drasgow, F., & Komocar, J. (1982). Applications of item response theory to analysis of attitude scale translations. Journal of Applied Psychology, 67, 818–825.

    Article  Google Scholar 

  • Hulin, C. L., & Mayer, L. J. (1986). Psychometric equivalence of a translation of the job descriptive index into Hebrew. Journal of Applied Psychology, 71, 83–94.

    Article  Google Scholar 

  • Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.

    Google Scholar 

  • Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.

    Article  Google Scholar 

  • Kim, S.-H., & Cohen, A. S. (1995). A comparison of Lord’s Chi-square, Raju’s area measures, and the likelihood ratio test on detection of differential item functioning. Applied Measurement in Education, 8, 291–312.

    Article  Google Scholar 

  • Kim, S.-H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345–355.

    Article  Google Scholar 

  • Landy, F. J. (2005). Employment discrimination litigation: Behavioral, quantitative, and legal perspectives. San Francisco: Jossey-Bass.

    Google Scholar 

  • McKay, R., Breslow, M., Sangster, R., Gabbard, S., Reynolds, R., Nakamoto, J., et al. (1996). Translating survey questionnaires: Lessons learned. New Directions for Evaluation, 70, 93–104.

    Article  Google Scholar 

  • Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.

    Article  Google Scholar 

  • Oshima, T. C., & Morris, S. B. (2008). Raju’s differential functioning of items and tests. Educational Measurement: Issues and Practice, 27, 43–50.

    Article  Google Scholar 

  • Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71, 1530–1548.

    Article  Google Scholar 

  • Plake, B. S. (1980). A comparison of a statistical and subjective procedure to ascertain item validity: One step in the test validation process. Educational and Psychological Measurement, 40, 397–404.

    Article  Google Scholar 

  • Ramirez, M., Ford, M. E., Stewart, A. L., & Teresi, J. A. (2005). Measurement issues in health disparities research. Health Services Research, 40, 1640–1657.

    Article  PubMed  Google Scholar 

  • Ramsey, P. (1993). Sensitivity review: The ETS experience as a case study. In P. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Rogers, T. B. (1992). Antecedents of operationalism: A case history in radical positivism. In C. W. Tolman (Ed.), Positivism in psychology: Historical and contemporary problems. London: Springer-Verlag.

    Google Scholar 

  • Ross, S. J., & Okabe, J. (2006). The subjective and objective interface of bias detection on languages. International Journal of Testing, 6, 229–253.

    Article  Google Scholar 

  • Ryan, A. M., Chan, D., Ployhart, R. E., & Slade, A. L. (1999). Employee attitude surveys in a multinational organization: Considering language and culture in assessing measurement equivalence. Personnel Psychology, 52, 37–58.

    Article  Google Scholar 

  • Ryan, A. M., Horvath, M., Ployhart, R. E., Schmitt, N., & Slade, L. A. (2000). Hypothesizing differential item functioning in global employee opinion surveys. Personnel Psychology, 53, 531–562.

    Article  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100–114.

    Google Scholar 

  • Sandoval, J., & Miille, M. W. (1980). Accuracy of judgments of WISC-R item difficulty for minority groups. Journal of Consulting and Clinical Psychology, 48, 249–253.

    Article  Google Scholar 

  • Searcy, C. A., & Lautenschlager, G. J. (1999). A monte carlo investigation of DIF assessment for polytomously scored items. Paper presented at the 14th annual meeting of the Society for Industrial and Organizational Psychology, Atlanta, GA.

  • Thissen, D. (2001). IRTLRDIFv2.0b: Software for the computation of the statistics involved in item response theory-based likelihood ratio tests for differential item functioning. Chapel Hill, NC: University of North Carolina.

    Google Scholar 

  • Thissen, D. (2003). MULTILOG 7.0: Multiple, categorical item analysis and test scoring using item response theory [Computer Program]. Chicago, IL: Scientific Software, Inc.

    Google Scholar 

  • Van de Vijver, F., & Hambleton, R. K. (1996). Translating tests: Some practical guidelines. European Psychologist, 1, 89–99.

    Article  Google Scholar 

  • Willgerodt, M. A. (2003). Using focus groups to develop culturally relevant instruments. Western Journal of Nursing Research, 25, 798–814.

    Article  PubMed  Google Scholar 

  • Woods, C. M. (2011). DIF testing for ordinal items with poly-SIBTEST, the Mantel and GMH tests, and IRT-LR-DIF when the latent distribution is nonnormal for both groups. Applied Psychological Measurement, 35, 145–164.

    Article  Google Scholar 

  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathan T. Carter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carter, N.T., Kotrba, L.M., Diab, D.L. et al. A Comparison of a Subjective and Statistical Method for Establishing Score Comparability in an Organizational Culture Survey. J Bus Psychol 27, 451–466 (2012). https://doi.org/10.1007/s10869-011-9254-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10869-011-9254-1

Keywords

Navigation