Skip to main content
Log in

Optimum number of Response categories

  • Published:
Current Psychology Aims and scope Submit manuscript

Abstract

Through linear transformations of raw item scores, the paper converts 3-ponit, 4-point, 5-point and 7-point items to continuous, monotonic, normally distributed scores ranging between 1 to 5. This provides a platform for meaningful comparisons of scales with different number of response categories with respect to parameters like reliability, validity, discriminating power, and undertakes analysis in parametric set up. The method makes no assumption of continuous nature or linearity or normality for the observed variables or the underlying variable being measured. Thus, the assumption-free simple method can have wide applicability. Use of such methods of converting scores of Likert items is recommended for clear theoretical advantages and easiness in calculations. Inverse relationship derived between new measures of test discriminating value in terms of co-efficient of variation (CV) and theoretically defined test reliability. Empirically, such inverse relationship was observed for the scales. Number of response categories did not show much influence on discriminating value, reliability and factorial validity, even for the transformed normalized scores in the range 1 to 5. Thus, the study could not find optimum number of response categories which maximize validity, reliability or discriminating value. Future studies with multi-data set for generalization of findings are suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

Nil (The paper used hypothetical data)

References

  • Arvidsson, R. (2019). On the use of ordinal scoring scales in social life cycle assessment. The Int. Jr. of Life Cycle Assessment, 24, 604–606. https://doi.org/10.1007/s11367-018-1557-2.

    Article  Google Scholar 

  • Bernstein, I. H., & Teng, H. (1989). Factoring items and factoring scales are different: Spurious evidence for multidimensionality due to item categorization. Psychological Bulletin, 76, 186–204.

    Google Scholar 

  • Boote, A. S. (1981). Reliability testing of psychographic scales: five-point or seven-point? Anchored or labeled? Journal of Advertising Research, 21, 53–60.

    Google Scholar 

  • Brown, G., Wilding II, R. E., & Coulter, R. L. (1991). Customer evaluation of retail salespeople using the SOCO scale: A replication extension and application. Journal of the Academy of Marketing Science, 9, 347–351.

    Article  Google Scholar 

  • Chakrabartty, S. N. (2020). Discriminating value of Item and Test. International Journal of Applied Mathematics and Statistics, 59(3), 61–78.

    Google Scholar 

  • Chakrabartty, S.N. (2018): Cosine similarity approaches to Reliability of Likert Scale and Items, Romanian Jr. of Psychological Studies, Volume 6, Issue 1.

  • Chakrabartty, S. N., & Gupta, R. (2016). Test Validity and Number of Response Categories: A Case of Bullying Scale. Journal of the Indian Academy of Applied Psychology, 42(2), 344–353.

    Google Scholar 

  • Cicchetti, D. V., Showalter, D., & Tyrer, P. J. (1985). The effect of number of rating scale categories on levels of interrater reliability: A Monte Carlo investigation. Applied Psychological Measurement, 9, 31–36.

    Article  Google Scholar 

  • Comrey, A. L., & Montang, I. (1982). Comparison of factor analytic results with two choice and seven choice personality item formats. Applied Psychological Measurement, 6, 285–289.

    Article  Google Scholar 

  • Colman, A. M., Norris, C. E., & Preston, C. C. (1997). Comparing rating scales of different lengths: Equivalence of scores from 5-point and 7-point scales. Psychological Reports, 80, 355–362.

    Article  Google Scholar 

  • Cummins, R. A. (1997). The Comprehensive Quality of Life Scale—intellectual/cognitive disability, (ComQol-I5) (5th ed.). School of Psychology, Deakin University.

  • Cummins, R. A. (2003). Normative life satisfaction: Measurement issues and homeostatic model. Social Indicators Research, 64, 225–240.

    Article  Google Scholar 

  • Field, A. P. (2003). Can meta-analysis be trusted? Psychologist, 16, 642–645.

    Google Scholar 

  • Finn, R. H. (1972). Effects of some variations in rating scale characteristics on the means and reliabilities of ratings. Educational and Psychological Measurement, 32(7), 255–265.

    Article  Google Scholar 

  • Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466–491.

    Article  PubMed  PubMed Central  Google Scholar 

  • Garcia, E. (2012). The Self-weighting Model. Communication in Statistics – Theory and Methods, 41(8), 1421–1427. https://doi.org/10.1080/03610926.2011.654037.

    Article  Google Scholar 

  • Green, P. E., & Rao, V. R. (1970). Rating scales and information recovery: How many scales and response categories to use? Journal of Marketing, 34(3), 33–39.

    Google Scholar 

  • Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: an alternative to coefficient Alpha. Psychometrika, 74, 155–167. https://doi.org/10.1007/s11336-008-9099-3.

    Article  Google Scholar 

  • Hancock, G. R., & Klockars, A. J. (1991). The effect of scale manipulations on validity: targeting frequency rating scales for anticipated performance levels. Applied Ergonomics, 22, 147.

    Article  PubMed  Google Scholar 

  • Jabrayilov, R., Emons, W. H. M., & Sijtsma, K. (2016). Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment. Applied Psychological Measurement, 40(8), 559–572. https://doi.org/10.1177/0146621616664046.

    Article  PubMed  PubMed Central  Google Scholar 

  • Jamieson, S. (2004). Likert scales: how to (ab) use them. Medical Education, 38, 1212–1218.

    Article  Google Scholar 

  • Jenkings, C. D., & Taber, T. A. (1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62, 392–398.

    Article  Google Scholar 

  • Jeong, H. J., & Lee, W. C. (2016). The level of collapse we are allowed: comparison of different response scales in safety attitudes questionnaire. Biom Biostat Int J, 4(4), 128–134. https://doi.org/10.15406/bbij.2016.04.00100.

    Article  Google Scholar 

  • King, L. A., King, D., & Klockars, A. J. (1983). Dichotomous and multipoint scales using bipolar adjectives. Applied Psychological Measurement, 7, 173–180.

    Article  Google Scholar 

  • Lim, H. E. (2008). The use of different happiness rating scales: bias and comparison problem? Social Indicators Research, 87, 259–267. https://doi.org/10.1007/s11205-007-9171-x.

  • Livingston, S. A. (2004). Equating test scores (without IRT). ETS.

  • Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4, 73–79.

    Article  Google Scholar 

  • Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Study 1: reliability and validity. Educational and Psychological Measurement, 31, 657–674.

    Article  Google Scholar 

  • Mertler, C. A. (2002): Using standardized test data to guide instruction and intervention. College Park, MD: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No. ED470589.

  • Neumann, L. (1979): Effects of categorization on relationships in bivariate distributions and applications to rating scales. Dissertation Abstracts International, 40, 2262-B.

  • Nunnally, J. C. (1970). Psychometric theory. McGraw- Hill.

  • Preston CC, Colman AM. (2000): Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychol.104:1–15

  • Sheng, Y., & Sheng, Z. (2012). Is coefficient alpha robust to non-normal data? Frontiers in Psychology, 3(34). https://doi.org/10.3389/fpstg.2012.00034.

  • Wakita, T., Ueshima, N., & Noguchi, H. (2012). Psychological Distance Between Categories in the Likert Scale: Comparing Different Numbers of Options. Educational and Psychological Measurement., 72(4), 533–546.

    Article  Google Scholar 

  • Zimmerman, D. W. (2009). Two separate effects on variance heterogeneity on the validity and power of significance tests of location. Statistical Methodology, 3(4), 351–337. https://doi.org/10.1016/j.stamet.2005.10.002.

Download references

Code Availability

No application of software package or custom code

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Satyendra Nath Chakrabartty.

Ethics declarations

Ethical Statement

This is a methodological paper and no ethical approval is required

Informed Consent

Not relevant for this paper using hypothetical data

Conflict of Interests

The author has no conflicts of interest to declare that are relevant to the content of this article

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chakrabartty, S.N. Optimum number of Response categories. Curr Psychol 42, 5590–5598 (2023). https://doi.org/10.1007/s12144-021-01866-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12144-021-01866-6

Keywords

Navigation