Comparability of Survey Measurements

  • Daniel L. Oberski


Whenever two or more survey statistics are compared, the question arises whether this comparison is warranted. Warranted usually means that there is no methodological artifact that could possibly explain any differences: I term this the “strong” interpretation of comparability. The “weak” interpretation of comparability is then that artifacts might exist, but evidence shows that they are not strong enough to explain away a particular substantive finding. In this chapter I discuss some methods to prevent, detect, and correct for incomparability. Translation issues and coding of design characteristics of questions in different countries are particularly relevant to cross-cultural studies. Strong and weak comparability, and the methods associated with them, are discussed for different aspects of total survey error (TSE). On the “measurement side” of TSE, invariance testing, differential item functioning, and anchoring vignettes are well-known techniques. On the “representation side,” I discuss the use of the R-indicator to provide evidence that the comparison of survey statistics is warranted.


Comparability Weak comparability Comparative surveys Equivalence Inequivalence Invariance Invariance testing Item bias Differential item functioning DIF Method bias R-indicator Maximum bias Differential nonresponse Measurement error Differential measurement error Reliability Unreliability Response function Translation Cross-cultural Cross-group Question coding system SQP Anchoring vignettes 


  1. Alwin, D. F. (2007). Margins of error: A study of reliability in survey measurement (Vol. 547). New York: Wiley.Google Scholar
  2. Baumgartner, H., & Steenkamp, J. B. E. M. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38(1), 143–156.CrossRefGoogle Scholar
  3. Bethlehem, J. (1988). Reduction of nonresponse bias through regression estimation. Journal of Official Statistics, 4(3), 251–260.Google Scholar
  4. Billiet, J. B., & McClendon, M. K. J. (2000). Modeling acquiescence in measurement models for two balanced sets of items. Structural Equation Modeling, 7(4), 608–628.CrossRefGoogle Scholar
  5. Boateng, S. K. (2009). Significant country differences in adult learning. Population and social conditions, Eurostat statistics in focus, .
  6. Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456.CrossRefGoogle Scholar
  7. Christian, L. M., & Dillman, D. A. (2004). The influence of graphical and symbolic language manipulations on responses to self-administered questions. Public Opinion Quarterly, 68(1), 57.CrossRefGoogle Scholar
  8. Cohany, S. R., Polivka, A. E., & Rothgeb, J. M. (1994). Revisions in the current population survey effective January 1994. Employment and Earnings, 41, 13.Google Scholar
  9. Davidov, E., Schmidt, P., & Billiet, J. (2010). Cross-cultural analysis: Methods and applications. New York: Taylor and Francis.Google Scholar
  10. Dillman, D. A., Smyth, J. D., & Christian, L. M. (2008). Internet, mail, and mixed-mode surveys: The tailored design method (3rd ed.). New York: Wiley.Google Scholar
  11. Donsbach, W., & Traugott, M. W. (2008). The SAGE handbook of public opinion research. London: Sage Publications Ltd.Google Scholar
  12. Fuller, W. A. (1987). Measurement error models. New York: Wiley.Google Scholar
  13. Ganzeboom, H. B. G., & Schröder, H. (2009). Measuring level of education in the European social survey. Keynot speech. Presented at the European survey research association (ESRA), Warsaw.Google Scholar
  14. Goudy, W. J. (1976). Nonresponse effects on relationships between variables. Public Opinion Quarterly, 40(3), 360.CrossRefGoogle Scholar
  15. Groves, R. M. (2004). Survey errors and survey costs (Vol. 536). New York: Wiley.Google Scholar
  16. Groves, R. M. (2006). Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70(5), 646.CrossRefGoogle Scholar
  17. Groves, R. M., & Couper, M. P. (1998). Nonresponse in household interview surveys. New York: Wiley.CrossRefGoogle Scholar
  18. Groves, R. M., & Peytcheva, E. (2008). The impact of nonresponse rates on nonresponse bias. Public Opinion Quarterly, 72(2), 167.CrossRefGoogle Scholar
  19. Harkness, J. A. (2003). Questionnaire translation. Cross-Cultural Survey Methods, 325, 35.Google Scholar
  20. Harkness, J. A., Braun, M., Edwards, B., Johnson, T. P., Lyberg, L. E., Mohler, P. P., et al. (2010). Survey Methods in Multicultural, Multinational, and Multiregional Contexts (Vol. 552). New York: Wiley.Google Scholar
  21. Harkness, J. A., Vijver, F. J. R., & Johnson, T. P. (2003). Questionnaire design in comparative research. Cross-Cultural Survey Methods, 325, 19–34.Google Scholar
  22. Harzing, A. W. (2006). Response styles in cross-national survey research. International Journal of Cross Cultural Management, 6(2), 243.CrossRefGoogle Scholar
  23. Hoffmeyer-Zlotnik, J. H. P., & Harkness, J. A. (2005). Methodological aspects in cross-national research. Mannheim: Zentrum für Umfragen, Methoden und Analysen (ZUMA).Google Scholar
  24. Holland, P. W. (1982). Test equating. New York: Academic Press.Google Scholar
  25. Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  26. Hui, C. H., & Triandis, H. C. (1985). Measurement in cross-cultural psychology. Journal of Cross-Cultural Psychology, 16(2), 131–152.CrossRefGoogle Scholar
  27. Jowell, R. (2007). Measuring attitudes cross-nationally: Lessons from the European Social Survey. London: Sage Publications Ltd.Google Scholar
  28. Kankaraš, M., Moors, G., & Vermunt, J. K. (2010). Testing for measurement invariance with latent class analysis. In E. Davidov, P. Schmidt, & J. Billiet (Eds.), Cross-cultural analysis: methods and applications. New York: Taylor and Francis.Google Scholar
  29. King, G., Murray, C. J. L., Salomon, J. A., & Tandon, A. (2004). Enhancing the validity and cross-cultural comparability of measurement in survey research. American Political Science Review, 98(1), 191–207.CrossRefGoogle Scholar
  30. Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied cognitive psychology, 5(3), 213–236.CrossRefGoogle Scholar
  31. Lessler, J. T., & Forsyth, B. H. (1996). A coding system for appraising questionnaires. In N. Schwarz & S. Sudman (Eds.), Answering questions (pp. 259–292). San Francisco: Jossey-Bass.Google Scholar
  32. Malhotra, N., & Krosnick, J. A. (2007). The effect of survey mode and sampling on inferences about political attitudes and behavior: Comparing the 2000 and 2004 ANES to Internet surveys with nonprobability samples. Political Analysis, 15(3), 286–323.CrossRefGoogle Scholar
  33. Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13(2), 127–143.CrossRefGoogle Scholar
  34. Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115(2), 300.CrossRefGoogle Scholar
  35. Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543.CrossRefGoogle Scholar
  36. Millsap, R. E., Meredith, W., Cudeck, R., & MacCallum, R. (2007). Factorial invariance: Historical perspectives and new problems. In R. Cudeck & R. C. MacCallum (Eds.), Factor analysis at 100: historical developments and future directions (p. 131). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  37. Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515.CrossRefGoogle Scholar
  38. Oberski, D. (2011). Measurement error in comparative surveys. Tilburg: Tilburg University.Google Scholar
  39. Oberski, D., Gruner, T., & Saris, W. E. (2011). SQP 2. Retrieved from
  40. Oberski, D., Saris, W. E., & Hagenaars, J. (2007). Why are there differences in measurement quality across countries. In: G. Loosveldt, D. Swyngedouw & B. Cambré (Eds.), Measuring meaningful data in social research. Leuven: Acco.Google Scholar
  41. Oberski D., Saris W.E. & Hagenaars J. (2008). Categorization errors and differences in the quality of questions across countries. In: T. D. Johnson & M. Braun (Eds.), Survey Methods in Multinational, Multiregional, and Multicultural Contexts (3MC). New York: Wiley and Sons, Ltd.Google Scholar
  42. Petty, R. E., & Krosnick, J. A. (1995). Attitude strength: Antecedents and consequences. Mahwah: Lawrence Erlbaum Associates, Inc.Google Scholar
  43. Reeskens, T., & Hooghe, M. (2008). Cross-cultural measurement equivalence of generalized trust. evidence from the european social survey (2002 and 2004). Social Indicators Research, 85(3), 515–532.CrossRefGoogle Scholar
  44. Saris, W. E. (1988). Variation in response functions: A source of measurement error in attitude research (Vol. 3). Amsterdam: Sociometric Research Foundation.Google Scholar
  45. Saris, W. E., & Andrews, F. M. (2011). Evaluation of Measurement Instruments Using a Structural Modeling Approach. In: P. P. Biemer, R. M. Groves, L. E. Lyberg, N. A. Thiowetz & S. Sudman (Eds.), Measurement Errors in Surveys (pp. 575–597). New York: John Wiley & Sons.Google Scholar
  46. Saris, W. E., & Gallhofer, I. N. (2007). Design, evaluation, and analysis of questionnaires for survey research (Vol. 548). New York: Wiley.CrossRefGoogle Scholar
  47. Schouten, B., Cobben, F., & Bethlehem, J. (2009). Indicators for the representativeness of survey response. Survey Methodology, 35(1), 101–113.Google Scholar
  48. Shlomo, N., Skinner, C., Schouten, B., Bethlehem, J., & Zhang, L. (2008). Statistical Properties of R-indicators. RISQ Work Package, 3. Retrieved from
  49. Steenkamp, J. B. E. M., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer research. Journal of consumer research, 25, 78–90.Google Scholar
  50. Stevens, S. S. (1975). Psychophysics: Introduction to its perceptual, neural, and social prospects. Piscataway: Transaction Publishers.Google Scholar
  51. Stoop, I., Billiet, J., & Koch, A. (2010). Improving survey response: Lessons learned from the European Social Survey. New York: Wiley.Google Scholar
  52. Tourangeau, R., Rips, L. J., & Rasinski, K. A. (2000). The psychology of survey response. Cambridge: Cambridge University Press.Google Scholar
  53. Van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis for cross-cultural research (Vol. 1). London: Sage Publications, Inc.Google Scholar
  54. von dem Knesebeck, O., Verde, P. E., & Dragano, N. (2006). Education and health in 22 European countries. Social Science and Medicine, 63(5), 1344–1351. doi: 10.1016/j.socscimed.2006.03.043.CrossRefGoogle Scholar
  55. Voogt, R. J. J. (2004). I’m not interested: nonresponse bias, response bias and stimulus effects in election research. Amsterdam: University of Amsterdam.Google Scholar
  56. Wand, J., King, G., & Lau, O. (2007). Anchors: Software for anchoring vignette data. Journal of Statistical Software, 42, 1–25.Google Scholar
  57. Willis, G. B. (2005). Cognitive interviewing: A tool for improving questionnaire design. :London: Sage Publications, Inc.Google Scholar
  58. Zavala, D. (2011, February 7). Deviations found through SQP coding in the ESS Round 5 questionnaires. Report given to the European Social Survey’s national coordinator’s meeting.Google Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.Joint Program in Survey MethodologyUniversity of MarylandCollege ParkUSA

Personalised recommendations