Skip to main content
Log in

When Rating Format Induces Different Rating Processes: The Effects of Descriptive and Evaluative Rating Modes on Discriminability and Accuracy

  • Original Paper
  • Published:
Journal of Business and Psychology Aims and scope Submit manuscript

Abstract

Purpose

We examined how different kinds of rating formats, and their interaction with purposes of rating (administrative vs. developmental), induced different performance rating processes and their consequences for rating accuracy.

Design/methodology/approach

In two experiments, participants rated seven targets presented via videotapes using modes of rating giving access to (a) descriptive knowledge (rating scales were a target’s observable behaviors: Descriptive Behavior–DB), (b) evaluative knowledge (rating scales were others’ behaviors that the target tended to afford: Evaluative Behavior–EB), or (c) a mix of the two knowledge types (rating scales were traits). Indexes of discriminability (within- and between-ratee discriminability) and of accuracy (differential elevation and differential accuracy) were collected.

Findings

The results showed that EB rating scales led to higher between-ratee discriminability and differential elevation than other modes of rating, whereas DB rating scales led to higher within-ratee discriminability than the other modes.

Implications

Our results indicate that EB rating scales are more suited to comparing different ratees (e.g., an administrative purpose for rating), whereas DB scales are more suited to identifying strengths and weaknesses of a particular ratee (e.g., a developmental purpose).

Originality/value

Our experiments are the first to apply dual-knowledge (descriptive vs. evaluative) theory to a performance appraisal context and to examine rating purpose in interaction with these two forms of person knowledge. The results, consistent with theoretical predictions, indicate that using rating scales with different types of content as a function of the rating purpose will produce more appropriate performance ratings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. A third important purpose is a research function, notably when researchers are conducting criterion-related validation studies. Although our experiments do not deal with this kind of purpose, it is important to note that the concern in criterion-related validation studies is between-ratee discriminability and not within-discriminability. Thus, the results of the present research could also have implications for various research purposes for appraisal.

  2. Murphy and Balzer (1986) obtained results that contradict this hypothesis. They showed that ratings had greater DEL with specific (behavior-based format) rather than global rating items (trait-based format). However, as their global and specific items were not comparable (global items were not derived from their specific items), we did not take these results into account to formulate our hypotheses.

  3. The behaviors submitted to the pretests (N EB = 60, N DB = 60) were extracted from the most frequent behaviors in a large pool of EBs and DBs obtained in a pilot study in which 80 undergraduate students indicated the first EB (versus DB) that came to mind for the set of 40 traits initially selected.

  4. These pilot tests were conducted with two groups of graduate students who were blind to the intended performance levels of each tape. The first group (N = 11) ranked the 7 managers according to their performance level. The second group (N = 14) ranked the 12 scales for each manager from the one on which the manager was the most competent to the one on which the manager was the least competent. For the results of the first test (ranking of the managers), although all differences were not significant, the managers were ranked as intended. For the results of the second test (ranking of the scales for each manager), although there was some variability in the ranking of the scales, in every case, a scale on which a manager’s performance was effective always received a higher ranking than a scale on which a manager’s performance was ineffective. It should be noted that although participants perceived differences in performance on the scales when the differences were great (i.e., differences over 4 ranks), they did not identify more subtle differences in performance (i.e., differences of 1 and 2 ranks). But this was not problematic. The important result was that the differences between the scales were sufficiently explicit so as to allow the participants to achieve discriminability within the targets.

  5. The order of presentation of the 7 targets was excluded from all the analyses in studies 1 and 2 because no main or interaction effect implying it was significant.

  6. Using restricted maximum likelihood estimation did not change the results in study 1 or in study 2.

  7. We are grateful to an anonymous reviewer to have suggested this test.

References

  • Abele, A. E., Uchronski, M., Suitner, C., & Wojciszke, B. (2008). Toward an operationalization of the fundamental dimensions of agency and communion: Trait content ratings in five countries considering valence and frequency of word occurrence. European Journal of Social Psychology, 38, 1202–1217. doi:10.1002/ejsp.575.

    Article  Google Scholar 

  • Bassili, J. N. (1989). Traits as action categories versus traits as person attributes in social cognition. In J. N. Bassili (Ed.), On line cognition in person perception. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Beauvois, J. L. (1987). The intuitive personologist and the individual differences model. European Journal of Social Psychology, 17, 81–94. doi:10.1002/ejsp.2420170108.

    Article  Google Scholar 

  • Beauvois, J. L., & Dubois, N. (2000). Affordances in social judgment: Experimental proof of why it is a mistake to ignore how others behave towards a target and look solely at how the target behaves. Swiss Journal of Psychology, 59, 16–33. doi:10.1024//1421-0185.59.1.16.

    Article  Google Scholar 

  • Bernardin, H., & Beatty, R. W. (1984). Performance appraisal: Assessing human behavior at work. Boston: Kent.

    Google Scholar 

  • Borman, W. C. (1977). Consistency of rating accuracy and rating errors in the judgment of human performance. Organizational Behavior & Human Performance, 20, 238–252. doi:10.1016/0030-5073(77)90004-6.

    Article  Google Scholar 

  • Borman, W. C., Bryant, R. H., & Dorio, J. (2010). The measurement of task performance as criteria in selection research. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (pp. 439–461). New York: Routledge.

    Google Scholar 

  • Cambon, L., Djouari, A., & Beauvois, J. L. (2006). Social judgment norms and social utility: When it is more valuable to be useful than desirable. Swiss Journal of Psychology, 65, 167–180. doi:10.1024/1421-0185.65.3.167.

    Article  Google Scholar 

  • Cawley, B. D., Keeping, L. M., & Levy, P. E. (1998). Participation in the performance appraisal process and employee reactions: A meta-analytic review of field investigations. Journal of Applied Psychology, 83, 615–633. doi:10.1037//0021-9010.83.4.615.

    Article  Google Scholar 

  • Cleveland, J. N., Murphy, K. R., & Williams, R. E. (1989). Multiple uses of performance appraisal: Prevalence and correlates. Journal of Applied Psychology, 74, 130–135. doi:10.1037//0021-9010.74.1.130.

    Article  Google Scholar 

  • Cronbach, L. (1955). Processes affecting scores on “understanding of others” and “assumed similarity.”. Psychological Bulletin, 52, 177–193. doi:10.1037/h0044919.

    Article  PubMed  Google Scholar 

  • Curtis, A. B., Harvey, R. D., & Ravden, D. (2005). Sources of political distorsions in performance appraisals: Appraisal purpose and rater accountability. Group and Organization Management, 30, 42–60. doi:10.1177/1059601104267666.

    Article  Google Scholar 

  • DeNisi, A. S., & Peters, L. H. (1996). Organization of information in memory and the performance appraisal process: Evidence from the field. Journal of Applied Psychology, 81, 717–737. doi:10.1037//0021-9010.81.6.717.

    Article  PubMed  Google Scholar 

  • DeNisi, A. S., Robbins, T. L., & Summers, T. P. (1997). Organization, processing, and use of performance information: A cognitive role for appraisal instruments. Journal of Applied Social Psychology, 27, 1884–1905. doi:10.1111/j.1559-1816.1997.tb01630.x.

    Article  Google Scholar 

  • DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and domains: 10 aspects of the Big Five. Journal of Personality and Social Psychology, 93, 880–896. doi:10.1037/0022-3514.93.5.880.

    Article  PubMed  Google Scholar 

  • Dubois, N., & Tarquinio, C. (1998). Le traitement de l’information évaluative par des professionnels de l’évaluation sociale. (Evaluative information processing by professionnals of social evaluation). Revue Internationale de Psychologie Sociale, 11, 99–122.

    Google Scholar 

  • Fay, C. H., & Latham, G. P. (1982). Effects of training and rating scales on rating errors. Personnel Psychology, 35, 105–116. doi:10.1111/j.1744-6570.1982.tb02188.x.

    Article  Google Scholar 

  • Funder, D.C. (1987). Errors and mistakes: Evaluating the accuracy of social judgment. Psychological Bulletin, 101, 75–90. doi:10.1037//0033-2909.101.1.75.

    Article  Google Scholar 

  • Funder, D. C., & Dobroth, K. M. (1987). Differences between traits: Properties associated with interjudge agreement. Journal of Personality and Social Psychology, 52, 409–418. doi:10.1037//0022-3514.52.2.409.

    Article  PubMed  Google Scholar 

  • Goffin, R. D., & Olson, J. M. (2011). Is it all relative? Comparative judgments and the possible improvement of self-ratings and ratings of others. Perspectives on Psychological Science, 6, 48–60. doi:10.1177/1745691610393521.

    Article  PubMed  Google Scholar 

  • Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist, 48, 26–34. doi:10.1037//0003-066X.48.1.26.

    Article  PubMed  Google Scholar 

  • Greenberg, J. (1986). Determinants of perceived fairness of performance evaluations. Journal of Applied Psychology, 71, 340–342. doi:10.1037//0021-9010.71.2.340.

    Article  Google Scholar 

  • Heneman, R. L. (1988). Traits, behaviors, and rater training: Some unexpected results. Human Performance, 1, 85–98. doi:10.1207/s15327043hup0102_1.

    Article  Google Scholar 

  • John, O. P. (1990). The “Big Five” factor taxonomy: Dimensions of personality in the natural language and in questionnaires. In L. A. Pervin (Ed.), Handbook of personality: Theory and research (pp. 66–100). New York: Guilford.

    Google Scholar 

  • Judd, C., James-Hawkins, L., Yzerbyt, V., & Kashima, Y. (2005). Fundamental dimensions of social judgment: Understanding the relations between judgments of competence and warmth. Journal of Personality and Social Psychology, 89, 899–913. doi:10.1037/0022-3514.89.6.899.

    Article  PubMed  Google Scholar 

  • Lamiell, J. T. (1981). Toward an idiothetic psychology of personality. American Psychologist, 36, 276–289. doi:10.1037//0003-066X.36.3.276.

    Article  Google Scholar 

  • Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72–107. doi:10.1037//0033-2909.87.1.72.

    Article  Google Scholar 

  • Landy, F. J., & Farr, J. L. (1983). The measurement of work performance: Methods, theory, and applications. New York: Academic Press.

    Google Scholar 

  • Latham, G. P., & Wexley, K. N. (1977). Behavioral observation scales for performance appraisal purposes. Personnel Psychology, 30, 255–268. doi:10.1111/j.1744-6570.1977.tb02092.x.

    Article  Google Scholar 

  • McArthur, L. Z., & Baron, R. M. (1983). Toward an ecological theory of social perception. Psychological Review, 90, 215–238. doi:10.1037//0033-295X.90.3.215.

    Article  Google Scholar 

  • McDonald, T. (1991). The effect of dimension content on observation and ratings of job performance. Organizational Behavior and Human Decision Processes, 48, 252–271. doi:10.1016/0749-5978(91)90014-K.

    Article  Google Scholar 

  • McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46. doi:10.1037//1082-989X.1.1.30.

    Article  Google Scholar 

  • Mignon, A., & Mollaret, P. (2002). Applying the affordance conception of traits: A person perception study. Personality and Social Psychology Bulletin, 28, 1327–1334. doi:10.1177/014616702236825.

    Article  Google Scholar 

  • Montoya, R. M., & Horton, R. S. (2004). On the importance of cognitive evaluation as a determinant of interpersonal attraction. Journal of Personality and Social Psychology, 86, 696–712. doi:10.1037/0022-3514.86.5.696.

    Article  PubMed  Google Scholar 

  • Murphy, K. R., & Balzer, W. K. (1986). Systematic distortions in memory-based behavior ratings and performance evaluations: Consequences for rating accuracy. Journal of Applied Psychology, 71, 39–44. doi:10.1037//0021-9010.71.1.39.

    Article  Google Scholar 

  • Murphy, K. R., & Cleveland, J. N. (1991). Performance appraisal: An organizational perspective. Needham Heights, MA: Allyn & Bacon.

    Google Scholar 

  • Murphy, K. R., & Constans, J. I. (1987). Behavioral anchors as a source of bias in rating. Journal of Applied Psychology, 72, 573–577. doi:10.1037//0021-9010.72.4.573.

    Article  Google Scholar 

  • Murphy, K. R., Garcia, M., Kerkar, S., Martin, C., & Balzer, W. K. (1982a). Relationship between observational accuracy and accuracy in evaluating performance. Journal of Applied Psychology, 67, 320–325. doi:10.1037//0021-9010.67.3.320.

    Article  Google Scholar 

  • Murphy, K. R., Kellam, K. L., Balzer, W. K., & Armstrong, J. G. (1984). Effects of the purpose of rating on accuracy in observing teacher behavior and evaluating teaching performance. Journal of Educational Psychology, 76, 45–54. doi:10.1037//0022-0663.76.1.45.

    Article  Google Scholar 

  • Murphy, K. R., Martin, C., & Garcia, M. (1982b). Do behavioral observation scales measure observation? Journal of Applied Psychology, 67, 562–567. doi:10.1037//0021-9010.67.5.562.

    Article  Google Scholar 

  • Murphy, K. R., & Pardaffy, V. A. (1989). Bias in Behaviorally Anchored Rating Scales: Global or scale-specific? Journal of Applied Psychology, 74, 343–346. doi:10.1037//0021-9010.74.2.343.

    Article  Google Scholar 

  • Paulhus, D. L., & Reynolds, S. (1995). Enhancing target variance in personality impressions: Highlighting the person in person perception. Journal of Personality and Social Psychology, 69, 1233–1242. doi:10.1037//0022-3514.69.6.1233.

    Article  PubMed  Google Scholar 

  • Piotrowski, M. J., Barnes-Farrell, J. L., & Esrig, F. H. (1989). Behaviorally anchored bias: A replication and extension of Murphy and Constans. Journal of Applied Psychology, 74, 823–826. doi:10.1037//0021-9010.74.5.823.

    Article  Google Scholar 

  • Pulakos, E. D. (1984). A comparison of rater training programs: Error training and accuracy training. Journal of Applied Psychology, 69, 581–588. doi:10.1037//0021-9010.69.4.581.

    Article  Google Scholar 

  • Pulakos, E. D., Schmitt, N., & Ostroff, C. (1986). A warning about the use of a standard deviation across dimensions within ratees to measure halo. Journal of Applied Psychology, 71, 29–32. doi:10.1037//0021-9010.71.1.29.

    Article  Google Scholar 

  • Roch, S. G., Sternburg, A. M., & Caputo, P. M. (2007). Absolute vs. relative performance rating formats: Implications for fairness and organizational justice. International Journal of Selection and Assessment, 15, 302–316. doi:10.1111/j.1468-2389.2007.00390.x.

    Article  Google Scholar 

  • Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological sciences. American Psychologist, 44, 1276–1284. doi:10.1037//0003-066X.44.10.1276.

    Article  Google Scholar 

  • Singh, R., Simons, J. P., Young, D. P., Sim, B. S., Chai, X. T., Singh, S., & Chiou, S. Y. (2009). Trust and respect as mediators of the other- and self-profitable trait effects on interpersonal attraction. European Journal of Social Psychology, 39, 1021–1038. doi:10.1002/ejsp.605.

    Article  Google Scholar 

  • Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47, 149–155. doi:10.1037/h0047060.

    Article  Google Scholar 

  • Srull, T. K., & Wyer, R. S, Jr. (1979). The role of category accessibility in the interpretation of information about persons: Some determinants and implications. Journal of Personality and Social Psychology, 37, 1660–1672. doi:10.1037//0022-3514.37.10.1660.

    Article  Google Scholar 

  • Suitner, C., & Maass, A. (2008). The role of valence in the perception of agency and communion. European Journal of Social Psychology, 38, 1073–1082. doi:10.1002/ejsp.525.

    Article  Google Scholar 

  • Tziner, A., & Kopelman, R. E. (2002). Is there a preferred performance rating format? A non-psychometric perspective. Applied Psychology, 51, 479–503. doi:10.1111/1464-0597.00104.

    Article  Google Scholar 

  • Woehr, D. J., & Roch, S. (2012). Supervisory performance ratings. In N. Schmitt (Ed.), The Oxford handbook of personnel assessment and selection (pp. 517–531). Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780199732579.013.0022.

    Google Scholar 

  • Wojciszke, B., Abele, A. E., & Baryla, W. (2009). Two dimensions of interpersonal attitudes: Liking depends on communion, respect depends on agency. European Journal of Social Psychology, 39, 973–990. doi:10.1002/ejsp.595.

    Article  Google Scholar 

  • Wong, K. F. E., & Kwong, J. Y. Y. (2007). Effects of rater goals on rating patterns: Evidence from an experimental field study. Journal of Applied Psychology, 92, 577–585. doi:10.1037/0021-9010.92.2.577.

    Article  PubMed  Google Scholar 

  • Zebrowitz, L. A., & Collins, M. A. (1997). Accurate social perception at zero acquaintance: The affordances of a Gibsonian approach. Personality and Social Psychology Review, 1, 204–223. doi:10.1207/s15327957pspr0103_2.

    Article  PubMed  Google Scholar 

  • Zedeck, S., & Cascio, W. F. (1982). Performance appraisal decisions as a function of rater training and purpose of the appraisal. Journal of Applied Psychology, 67, 752–758. doi:10.1037//0021-9010.67.6.752.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laurent Cambon.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cambon, L., Steiner, D.D. When Rating Format Induces Different Rating Processes: The Effects of Descriptive and Evaluative Rating Modes on Discriminability and Accuracy. J Bus Psychol 30, 795–812 (2015). https://doi.org/10.1007/s10869-014-9389-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10869-014-9389-y

Keywords

Navigation