When Rating Format Induces Different Rating Processes: The Effects of Descriptive and Evaluative Rating Modes on Discriminability and Accuracy

Cambon, Laurent; Steiner, Dirk D.

doi:10.1007/s10869-014-9389-y

When Rating Format Induces Different Rating Processes: The Effects of Descriptive and Evaluative Rating Modes on Discriminability and Accuracy

Original Paper
Published: 26 November 2014

Volume 30, pages 795–812, (2015)
Cite this article

Journal of Business and Psychology Aims and scope Submit manuscript

Laurent Cambon¹ &
Dirk D. Steiner¹

618 Accesses
8 Citations
Explore all metrics

Abstract

Purpose

We examined how different kinds of rating formats, and their interaction with purposes of rating (administrative vs. developmental), induced different performance rating processes and their consequences for rating accuracy.

Design/methodology/approach

In two experiments, participants rated seven targets presented via videotapes using modes of rating giving access to (a) descriptive knowledge (rating scales were a target’s observable behaviors: Descriptive Behavior–DB), (b) evaluative knowledge (rating scales were others’ behaviors that the target tended to afford: Evaluative Behavior–EB), or (c) a mix of the two knowledge types (rating scales were traits). Indexes of discriminability (within- and between-ratee discriminability) and of accuracy (differential elevation and differential accuracy) were collected.

Findings

The results showed that EB rating scales led to higher between-ratee discriminability and differential elevation than other modes of rating, whereas DB rating scales led to higher within-ratee discriminability than the other modes.

Implications

Our results indicate that EB rating scales are more suited to comparing different ratees (e.g., an administrative purpose for rating), whereas DB scales are more suited to identifying strengths and weaknesses of a particular ratee (e.g., a developmental purpose).

Originality/value

Our experiments are the first to apply dual-knowledge (descriptive vs. evaluative) theory to a performance appraisal context and to examine rating purpose in interaction with these two forms of person knowledge. The results, consistent with theoretical predictions, indicate that using rating scales with different types of content as a function of the rating purpose will produce more appropriate performance ratings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance, incentives, and needs for autonomy, competence, and relatedness: a meta-analysis

Article 20 September 2016

A review and empirical comparison of motivation scoring methods: An application to self-determination theory

Article 17 April 2020

Under- versus overconfidence: an experiment on how others perceive a biased self-assessment

Article 28 March 2015

Notes

A third important purpose is a research function, notably when researchers are conducting criterion-related validation studies. Although our experiments do not deal with this kind of purpose, it is important to note that the concern in criterion-related validation studies is between-ratee discriminability and not within-discriminability. Thus, the results of the present research could also have implications for various research purposes for appraisal.
Murphy and Balzer (1986) obtained results that contradict this hypothesis. They showed that ratings had greater DEL with specific (behavior-based format) rather than global rating items (trait-based format). However, as their global and specific items were not comparable (global items were not derived from their specific items), we did not take these results into account to formulate our hypotheses.
The behaviors submitted to the pretests (N _EB = 60, N _DB = 60) were extracted from the most frequent behaviors in a large pool of EBs and DBs obtained in a pilot study in which 80 undergraduate students indicated the first EB (versus DB) that came to mind for the set of 40 traits initially selected.
These pilot tests were conducted with two groups of graduate students who were blind to the intended performance levels of each tape. The first group (N = 11) ranked the 7 managers according to their performance level. The second group (N = 14) ranked the 12 scales for each manager from the one on which the manager was the most competent to the one on which the manager was the least competent. For the results of the first test (ranking of the managers), although all differences were not significant, the managers were ranked as intended. For the results of the second test (ranking of the scales for each manager), although there was some variability in the ranking of the scales, in every case, a scale on which a manager’s performance was effective always received a higher ranking than a scale on which a manager’s performance was ineffective. It should be noted that although participants perceived differences in performance on the scales when the differences were great (i.e., differences over 4 ranks), they did not identify more subtle differences in performance (i.e., differences of 1 and 2 ranks). But this was not problematic. The important result was that the differences between the scales were sufficiently explicit so as to allow the participants to achieve discriminability within the targets.
The order of presentation of the 7 targets was excluded from all the analyses in studies 1 and 2 because no main or interaction effect implying it was significant.
Using restricted maximum likelihood estimation did not change the results in study 1 or in study 2.
We are grateful to an anonymous reviewer to have suggested this test.

References

Abele, A. E., Uchronski, M., Suitner, C., & Wojciszke, B. (2008). Toward an operationalization of the fundamental dimensions of agency and communion: Trait content ratings in five countries considering valence and frequency of word occurrence. European Journal of Social Psychology, 38, 1202–1217. doi:10.1002/ejsp.575.
Article Google Scholar
Bassili, J. N. (1989). Traits as action categories versus traits as person attributes in social cognition. In J. N. Bassili (Ed.), On line cognition in person perception. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Beauvois, J. L. (1987). The intuitive personologist and the individual differences model. European Journal of Social Psychology, 17, 81–94. doi:10.1002/ejsp.2420170108.
Article Google Scholar
Beauvois, J. L., & Dubois, N. (2000). Affordances in social judgment: Experimental proof of why it is a mistake to ignore how others behave towards a target and look solely at how the target behaves. Swiss Journal of Psychology, 59, 16–33. doi:10.1024//1421-0185.59.1.16.
Article Google Scholar
Bernardin, H., & Beatty, R. W. (1984). Performance appraisal: Assessing human behavior at work. Boston: Kent.
Google Scholar
Borman, W. C. (1977). Consistency of rating accuracy and rating errors in the judgment of human performance. Organizational Behavior & Human Performance, 20, 238–252. doi:10.1016/0030-5073(77)90004-6.
Article Google Scholar
Borman, W. C., Bryant, R. H., & Dorio, J. (2010). The measurement of task performance as criteria in selection research. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (pp. 439–461). New York: Routledge.
Google Scholar
Cambon, L., Djouari, A., & Beauvois, J. L. (2006). Social judgment norms and social utility: When it is more valuable to be useful than desirable. Swiss Journal of Psychology, 65, 167–180. doi:10.1024/1421-0185.65.3.167.
Article Google Scholar
Cawley, B. D., Keeping, L. M., & Levy, P. E. (1998). Participation in the performance appraisal process and employee reactions: A meta-analytic review of field investigations. Journal of Applied Psychology, 83, 615–633. doi:10.1037//0021-9010.83.4.615.
Article Google Scholar
Cleveland, J. N., Murphy, K. R., & Williams, R. E. (1989). Multiple uses of performance appraisal: Prevalence and correlates. Journal of Applied Psychology, 74, 130–135. doi:10.1037//0021-9010.74.1.130.
Article Google Scholar
Cronbach, L. (1955). Processes affecting scores on “understanding of others” and “assumed similarity.”. Psychological Bulletin, 52, 177–193. doi:10.1037/h0044919.
Article PubMed Google Scholar
Curtis, A. B., Harvey, R. D., & Ravden, D. (2005). Sources of political distorsions in performance appraisals: Appraisal purpose and rater accountability. Group and Organization Management, 30, 42–60. doi:10.1177/1059601104267666.
Article Google Scholar
DeNisi, A. S., & Peters, L. H. (1996). Organization of information in memory and the performance appraisal process: Evidence from the field. Journal of Applied Psychology, 81, 717–737. doi:10.1037//0021-9010.81.6.717.
Article PubMed Google Scholar
DeNisi, A. S., Robbins, T. L., & Summers, T. P. (1997). Organization, processing, and use of performance information: A cognitive role for appraisal instruments. Journal of Applied Social Psychology, 27, 1884–1905. doi:10.1111/j.1559-1816.1997.tb01630.x.
Article Google Scholar
DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and domains: 10 aspects of the Big Five. Journal of Personality and Social Psychology, 93, 880–896. doi:10.1037/0022-3514.93.5.880.
Article PubMed Google Scholar
Dubois, N., & Tarquinio, C. (1998). Le traitement de l’information évaluative par des professionnels de l’évaluation sociale. (Evaluative information processing by professionnals of social evaluation). Revue Internationale de Psychologie Sociale, 11, 99–122.
Google Scholar
Fay, C. H., & Latham, G. P. (1982). Effects of training and rating scales on rating errors. Personnel Psychology, 35, 105–116. doi:10.1111/j.1744-6570.1982.tb02188.x.
Article Google Scholar
Funder, D.C. (1987). Errors and mistakes: Evaluating the accuracy of social judgment. Psychological Bulletin, 101, 75–90. doi:10.1037//0033-2909.101.1.75.
Article Google Scholar
Funder, D. C., & Dobroth, K. M. (1987). Differences between traits: Properties associated with interjudge agreement. Journal of Personality and Social Psychology, 52, 409–418. doi:10.1037//0022-3514.52.2.409.
Article PubMed Google Scholar
Goffin, R. D., & Olson, J. M. (2011). Is it all relative? Comparative judgments and the possible improvement of self-ratings and ratings of others. Perspectives on Psychological Science, 6, 48–60. doi:10.1177/1745691610393521.
Article PubMed Google Scholar
Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist, 48, 26–34. doi:10.1037//0003-066X.48.1.26.
Article PubMed Google Scholar
Greenberg, J. (1986). Determinants of perceived fairness of performance evaluations. Journal of Applied Psychology, 71, 340–342. doi:10.1037//0021-9010.71.2.340.
Article Google Scholar
Heneman, R. L. (1988). Traits, behaviors, and rater training: Some unexpected results. Human Performance, 1, 85–98. doi:10.1207/s15327043hup0102_1.
Article Google Scholar
John, O. P. (1990). The “Big Five” factor taxonomy: Dimensions of personality in the natural language and in questionnaires. In L. A. Pervin (Ed.), Handbook of personality: Theory and research (pp. 66–100). New York: Guilford.
Google Scholar
Judd, C., James-Hawkins, L., Yzerbyt, V., & Kashima, Y. (2005). Fundamental dimensions of social judgment: Understanding the relations between judgments of competence and warmth. Journal of Personality and Social Psychology, 89, 899–913. doi:10.1037/0022-3514.89.6.899.
Article PubMed Google Scholar
Lamiell, J. T. (1981). Toward an idiothetic psychology of personality. American Psychologist, 36, 276–289. doi:10.1037//0003-066X.36.3.276.
Article Google Scholar
Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72–107. doi:10.1037//0033-2909.87.1.72.
Article Google Scholar
Landy, F. J., & Farr, J. L. (1983). The measurement of work performance: Methods, theory, and applications. New York: Academic Press.
Google Scholar
Latham, G. P., & Wexley, K. N. (1977). Behavioral observation scales for performance appraisal purposes. Personnel Psychology, 30, 255–268. doi:10.1111/j.1744-6570.1977.tb02092.x.
Article Google Scholar
McArthur, L. Z., & Baron, R. M. (1983). Toward an ecological theory of social perception. Psychological Review, 90, 215–238. doi:10.1037//0033-295X.90.3.215.
Article Google Scholar
McDonald, T. (1991). The effect of dimension content on observation and ratings of job performance. Organizational Behavior and Human Decision Processes, 48, 252–271. doi:10.1016/0749-5978(91)90014-K.
Article Google Scholar
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46. doi:10.1037//1082-989X.1.1.30.
Article Google Scholar
Mignon, A., & Mollaret, P. (2002). Applying the affordance conception of traits: A person perception study. Personality and Social Psychology Bulletin, 28, 1327–1334. doi:10.1177/014616702236825.
Article Google Scholar
Montoya, R. M., & Horton, R. S. (2004). On the importance of cognitive evaluation as a determinant of interpersonal attraction. Journal of Personality and Social Psychology, 86, 696–712. doi:10.1037/0022-3514.86.5.696.
Article PubMed Google Scholar
Murphy, K. R., & Balzer, W. K. (1986). Systematic distortions in memory-based behavior ratings and performance evaluations: Consequences for rating accuracy. Journal of Applied Psychology, 71, 39–44. doi:10.1037//0021-9010.71.1.39.
Article Google Scholar
Murphy, K. R., & Cleveland, J. N. (1991). Performance appraisal: An organizational perspective. Needham Heights, MA: Allyn & Bacon.
Google Scholar
Murphy, K. R., & Constans, J. I. (1987). Behavioral anchors as a source of bias in rating. Journal of Applied Psychology, 72, 573–577. doi:10.1037//0021-9010.72.4.573.
Article Google Scholar
Murphy, K. R., Garcia, M., Kerkar, S., Martin, C., & Balzer, W. K. (1982a). Relationship between observational accuracy and accuracy in evaluating performance. Journal of Applied Psychology, 67, 320–325. doi:10.1037//0021-9010.67.3.320.
Article Google Scholar
Murphy, K. R., Kellam, K. L., Balzer, W. K., & Armstrong, J. G. (1984). Effects of the purpose of rating on accuracy in observing teacher behavior and evaluating teaching performance. Journal of Educational Psychology, 76, 45–54. doi:10.1037//0022-0663.76.1.45.
Article Google Scholar
Murphy, K. R., Martin, C., & Garcia, M. (1982b). Do behavioral observation scales measure observation? Journal of Applied Psychology, 67, 562–567. doi:10.1037//0021-9010.67.5.562.
Article Google Scholar
Murphy, K. R., & Pardaffy, V. A. (1989). Bias in Behaviorally Anchored Rating Scales: Global or scale-specific? Journal of Applied Psychology, 74, 343–346. doi:10.1037//0021-9010.74.2.343.
Article Google Scholar
Paulhus, D. L., & Reynolds, S. (1995). Enhancing target variance in personality impressions: Highlighting the person in person perception. Journal of Personality and Social Psychology, 69, 1233–1242. doi:10.1037//0022-3514.69.6.1233.
Article PubMed Google Scholar
Piotrowski, M. J., Barnes-Farrell, J. L., & Esrig, F. H. (1989). Behaviorally anchored bias: A replication and extension of Murphy and Constans. Journal of Applied Psychology, 74, 823–826. doi:10.1037//0021-9010.74.5.823.
Article Google Scholar
Pulakos, E. D. (1984). A comparison of rater training programs: Error training and accuracy training. Journal of Applied Psychology, 69, 581–588. doi:10.1037//0021-9010.69.4.581.
Article Google Scholar
Pulakos, E. D., Schmitt, N., & Ostroff, C. (1986). A warning about the use of a standard deviation across dimensions within ratees to measure halo. Journal of Applied Psychology, 71, 29–32. doi:10.1037//0021-9010.71.1.29.
Article Google Scholar
Roch, S. G., Sternburg, A. M., & Caputo, P. M. (2007). Absolute vs. relative performance rating formats: Implications for fairness and organizational justice. International Journal of Selection and Assessment, 15, 302–316. doi:10.1111/j.1468-2389.2007.00390.x.
Article Google Scholar
Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological sciences. American Psychologist, 44, 1276–1284. doi:10.1037//0003-066X.44.10.1276.
Article Google Scholar
Singh, R., Simons, J. P., Young, D. P., Sim, B. S., Chai, X. T., Singh, S., & Chiou, S. Y. (2009). Trust and respect as mediators of the other- and self-profitable trait effects on interpersonal attraction. European Journal of Social Psychology, 39, 1021–1038. doi:10.1002/ejsp.605.
Article Google Scholar
Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47, 149–155. doi:10.1037/h0047060.
Article Google Scholar
Srull, T. K., & Wyer, R. S, Jr. (1979). The role of category accessibility in the interpretation of information about persons: Some determinants and implications. Journal of Personality and Social Psychology, 37, 1660–1672. doi:10.1037//0022-3514.37.10.1660.
Article Google Scholar
Suitner, C., & Maass, A. (2008). The role of valence in the perception of agency and communion. European Journal of Social Psychology, 38, 1073–1082. doi:10.1002/ejsp.525.
Article Google Scholar
Tziner, A., & Kopelman, R. E. (2002). Is there a preferred performance rating format? A non-psychometric perspective. Applied Psychology, 51, 479–503. doi:10.1111/1464-0597.00104.
Article Google Scholar
Woehr, D. J., & Roch, S. (2012). Supervisory performance ratings. In N. Schmitt (Ed.), The Oxford handbook of personnel assessment and selection (pp. 517–531). Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780199732579.013.0022.
Google Scholar
Wojciszke, B., Abele, A. E., & Baryla, W. (2009). Two dimensions of interpersonal attitudes: Liking depends on communion, respect depends on agency. European Journal of Social Psychology, 39, 973–990. doi:10.1002/ejsp.595.
Article Google Scholar
Wong, K. F. E., & Kwong, J. Y. Y. (2007). Effects of rater goals on rating patterns: Evidence from an experimental field study. Journal of Applied Psychology, 92, 577–585. doi:10.1037/0021-9010.92.2.577.
Article PubMed Google Scholar
Zebrowitz, L. A., & Collins, M. A. (1997). Accurate social perception at zero acquaintance: The affordances of a Gibsonian approach. Personality and Social Psychology Review, 1, 204–223. doi:10.1207/s15327957pspr0103_2.
Article PubMed Google Scholar
Zedeck, S., & Cascio, W. F. (1982). Performance appraisal decisions as a function of rater training and purpose of the appraisal. Journal of Applied Psychology, 67, 752–758. doi:10.1037//0021-9010.67.6.752.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Anthropologie et de Psychologie Cognitive et Sociale, Université de Nice-Sophia Antipolis, 24 avenue des Diables Bleus, 06357, Nice Cedex 4, France
Laurent Cambon & Dirk D. Steiner

Authors

Laurent Cambon
View author publications
You can also search for this author in PubMed Google Scholar
Dirk D. Steiner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laurent Cambon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cambon, L., Steiner, D.D. When Rating Format Induces Different Rating Processes: The Effects of Descriptive and Evaluative Rating Modes on Discriminability and Accuracy. J Bus Psychol 30, 795–812 (2015). https://doi.org/10.1007/s10869-014-9389-y

Download citation

Published: 26 November 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10869-014-9389-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

When Rating Format Induces Different Rating Processes: The Effects of Descriptive and Evaluative Rating Modes on Discriminability and Accuracy