Abstract
Purpose
We examined how different kinds of rating formats, and their interaction with purposes of rating (administrative vs. developmental), induced different performance rating processes and their consequences for rating accuracy.
Design/methodology/approach
In two experiments, participants rated seven targets presented via videotapes using modes of rating giving access to (a) descriptive knowledge (rating scales were a target’s observable behaviors: Descriptive Behavior–DB), (b) evaluative knowledge (rating scales were others’ behaviors that the target tended to afford: Evaluative Behavior–EB), or (c) a mix of the two knowledge types (rating scales were traits). Indexes of discriminability (within- and between-ratee discriminability) and of accuracy (differential elevation and differential accuracy) were collected.
Findings
The results showed that EB rating scales led to higher between-ratee discriminability and differential elevation than other modes of rating, whereas DB rating scales led to higher within-ratee discriminability than the other modes.
Implications
Our results indicate that EB rating scales are more suited to comparing different ratees (e.g., an administrative purpose for rating), whereas DB scales are more suited to identifying strengths and weaknesses of a particular ratee (e.g., a developmental purpose).
Originality/value
Our experiments are the first to apply dual-knowledge (descriptive vs. evaluative) theory to a performance appraisal context and to examine rating purpose in interaction with these two forms of person knowledge. The results, consistent with theoretical predictions, indicate that using rating scales with different types of content as a function of the rating purpose will produce more appropriate performance ratings.
Similar content being viewed by others
Notes
A third important purpose is a research function, notably when researchers are conducting criterion-related validation studies. Although our experiments do not deal with this kind of purpose, it is important to note that the concern in criterion-related validation studies is between-ratee discriminability and not within-discriminability. Thus, the results of the present research could also have implications for various research purposes for appraisal.
Murphy and Balzer (1986) obtained results that contradict this hypothesis. They showed that ratings had greater DEL with specific (behavior-based format) rather than global rating items (trait-based format). However, as their global and specific items were not comparable (global items were not derived from their specific items), we did not take these results into account to formulate our hypotheses.
The behaviors submitted to the pretests (N EB = 60, N DB = 60) were extracted from the most frequent behaviors in a large pool of EBs and DBs obtained in a pilot study in which 80 undergraduate students indicated the first EB (versus DB) that came to mind for the set of 40 traits initially selected.
These pilot tests were conducted with two groups of graduate students who were blind to the intended performance levels of each tape. The first group (N = 11) ranked the 7 managers according to their performance level. The second group (N = 14) ranked the 12 scales for each manager from the one on which the manager was the most competent to the one on which the manager was the least competent. For the results of the first test (ranking of the managers), although all differences were not significant, the managers were ranked as intended. For the results of the second test (ranking of the scales for each manager), although there was some variability in the ranking of the scales, in every case, a scale on which a manager’s performance was effective always received a higher ranking than a scale on which a manager’s performance was ineffective. It should be noted that although participants perceived differences in performance on the scales when the differences were great (i.e., differences over 4 ranks), they did not identify more subtle differences in performance (i.e., differences of 1 and 2 ranks). But this was not problematic. The important result was that the differences between the scales were sufficiently explicit so as to allow the participants to achieve discriminability within the targets.
The order of presentation of the 7 targets was excluded from all the analyses in studies 1 and 2 because no main or interaction effect implying it was significant.
Using restricted maximum likelihood estimation did not change the results in study 1 or in study 2.
We are grateful to an anonymous reviewer to have suggested this test.
References
Abele, A. E., Uchronski, M., Suitner, C., & Wojciszke, B. (2008). Toward an operationalization of the fundamental dimensions of agency and communion: Trait content ratings in five countries considering valence and frequency of word occurrence. European Journal of Social Psychology, 38, 1202–1217. doi:10.1002/ejsp.575.
Bassili, J. N. (1989). Traits as action categories versus traits as person attributes in social cognition. In J. N. Bassili (Ed.), On line cognition in person perception. Hillsdale, NJ: Lawrence Erlbaum.
Beauvois, J. L. (1987). The intuitive personologist and the individual differences model. European Journal of Social Psychology, 17, 81–94. doi:10.1002/ejsp.2420170108.
Beauvois, J. L., & Dubois, N. (2000). Affordances in social judgment: Experimental proof of why it is a mistake to ignore how others behave towards a target and look solely at how the target behaves. Swiss Journal of Psychology, 59, 16–33. doi:10.1024//1421-0185.59.1.16.
Bernardin, H., & Beatty, R. W. (1984). Performance appraisal: Assessing human behavior at work. Boston: Kent.
Borman, W. C. (1977). Consistency of rating accuracy and rating errors in the judgment of human performance. Organizational Behavior & Human Performance, 20, 238–252. doi:10.1016/0030-5073(77)90004-6.
Borman, W. C., Bryant, R. H., & Dorio, J. (2010). The measurement of task performance as criteria in selection research. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (pp. 439–461). New York: Routledge.
Cambon, L., Djouari, A., & Beauvois, J. L. (2006). Social judgment norms and social utility: When it is more valuable to be useful than desirable. Swiss Journal of Psychology, 65, 167–180. doi:10.1024/1421-0185.65.3.167.
Cawley, B. D., Keeping, L. M., & Levy, P. E. (1998). Participation in the performance appraisal process and employee reactions: A meta-analytic review of field investigations. Journal of Applied Psychology, 83, 615–633. doi:10.1037//0021-9010.83.4.615.
Cleveland, J. N., Murphy, K. R., & Williams, R. E. (1989). Multiple uses of performance appraisal: Prevalence and correlates. Journal of Applied Psychology, 74, 130–135. doi:10.1037//0021-9010.74.1.130.
Cronbach, L. (1955). Processes affecting scores on “understanding of others” and “assumed similarity.”. Psychological Bulletin, 52, 177–193. doi:10.1037/h0044919.
Curtis, A. B., Harvey, R. D., & Ravden, D. (2005). Sources of political distorsions in performance appraisals: Appraisal purpose and rater accountability. Group and Organization Management, 30, 42–60. doi:10.1177/1059601104267666.
DeNisi, A. S., & Peters, L. H. (1996). Organization of information in memory and the performance appraisal process: Evidence from the field. Journal of Applied Psychology, 81, 717–737. doi:10.1037//0021-9010.81.6.717.
DeNisi, A. S., Robbins, T. L., & Summers, T. P. (1997). Organization, processing, and use of performance information: A cognitive role for appraisal instruments. Journal of Applied Social Psychology, 27, 1884–1905. doi:10.1111/j.1559-1816.1997.tb01630.x.
DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and domains: 10 aspects of the Big Five. Journal of Personality and Social Psychology, 93, 880–896. doi:10.1037/0022-3514.93.5.880.
Dubois, N., & Tarquinio, C. (1998). Le traitement de l’information évaluative par des professionnels de l’évaluation sociale. (Evaluative information processing by professionnals of social evaluation). Revue Internationale de Psychologie Sociale, 11, 99–122.
Fay, C. H., & Latham, G. P. (1982). Effects of training and rating scales on rating errors. Personnel Psychology, 35, 105–116. doi:10.1111/j.1744-6570.1982.tb02188.x.
Funder, D.C. (1987). Errors and mistakes: Evaluating the accuracy of social judgment. Psychological Bulletin, 101, 75–90. doi:10.1037//0033-2909.101.1.75.
Funder, D. C., & Dobroth, K. M. (1987). Differences between traits: Properties associated with interjudge agreement. Journal of Personality and Social Psychology, 52, 409–418. doi:10.1037//0022-3514.52.2.409.
Goffin, R. D., & Olson, J. M. (2011). Is it all relative? Comparative judgments and the possible improvement of self-ratings and ratings of others. Perspectives on Psychological Science, 6, 48–60. doi:10.1177/1745691610393521.
Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist, 48, 26–34. doi:10.1037//0003-066X.48.1.26.
Greenberg, J. (1986). Determinants of perceived fairness of performance evaluations. Journal of Applied Psychology, 71, 340–342. doi:10.1037//0021-9010.71.2.340.
Heneman, R. L. (1988). Traits, behaviors, and rater training: Some unexpected results. Human Performance, 1, 85–98. doi:10.1207/s15327043hup0102_1.
John, O. P. (1990). The “Big Five” factor taxonomy: Dimensions of personality in the natural language and in questionnaires. In L. A. Pervin (Ed.), Handbook of personality: Theory and research (pp. 66–100). New York: Guilford.
Judd, C., James-Hawkins, L., Yzerbyt, V., & Kashima, Y. (2005). Fundamental dimensions of social judgment: Understanding the relations between judgments of competence and warmth. Journal of Personality and Social Psychology, 89, 899–913. doi:10.1037/0022-3514.89.6.899.
Lamiell, J. T. (1981). Toward an idiothetic psychology of personality. American Psychologist, 36, 276–289. doi:10.1037//0003-066X.36.3.276.
Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72–107. doi:10.1037//0033-2909.87.1.72.
Landy, F. J., & Farr, J. L. (1983). The measurement of work performance: Methods, theory, and applications. New York: Academic Press.
Latham, G. P., & Wexley, K. N. (1977). Behavioral observation scales for performance appraisal purposes. Personnel Psychology, 30, 255–268. doi:10.1111/j.1744-6570.1977.tb02092.x.
McArthur, L. Z., & Baron, R. M. (1983). Toward an ecological theory of social perception. Psychological Review, 90, 215–238. doi:10.1037//0033-295X.90.3.215.
McDonald, T. (1991). The effect of dimension content on observation and ratings of job performance. Organizational Behavior and Human Decision Processes, 48, 252–271. doi:10.1016/0749-5978(91)90014-K.
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46. doi:10.1037//1082-989X.1.1.30.
Mignon, A., & Mollaret, P. (2002). Applying the affordance conception of traits: A person perception study. Personality and Social Psychology Bulletin, 28, 1327–1334. doi:10.1177/014616702236825.
Montoya, R. M., & Horton, R. S. (2004). On the importance of cognitive evaluation as a determinant of interpersonal attraction. Journal of Personality and Social Psychology, 86, 696–712. doi:10.1037/0022-3514.86.5.696.
Murphy, K. R., & Balzer, W. K. (1986). Systematic distortions in memory-based behavior ratings and performance evaluations: Consequences for rating accuracy. Journal of Applied Psychology, 71, 39–44. doi:10.1037//0021-9010.71.1.39.
Murphy, K. R., & Cleveland, J. N. (1991). Performance appraisal: An organizational perspective. Needham Heights, MA: Allyn & Bacon.
Murphy, K. R., & Constans, J. I. (1987). Behavioral anchors as a source of bias in rating. Journal of Applied Psychology, 72, 573–577. doi:10.1037//0021-9010.72.4.573.
Murphy, K. R., Garcia, M., Kerkar, S., Martin, C., & Balzer, W. K. (1982a). Relationship between observational accuracy and accuracy in evaluating performance. Journal of Applied Psychology, 67, 320–325. doi:10.1037//0021-9010.67.3.320.
Murphy, K. R., Kellam, K. L., Balzer, W. K., & Armstrong, J. G. (1984). Effects of the purpose of rating on accuracy in observing teacher behavior and evaluating teaching performance. Journal of Educational Psychology, 76, 45–54. doi:10.1037//0022-0663.76.1.45.
Murphy, K. R., Martin, C., & Garcia, M. (1982b). Do behavioral observation scales measure observation? Journal of Applied Psychology, 67, 562–567. doi:10.1037//0021-9010.67.5.562.
Murphy, K. R., & Pardaffy, V. A. (1989). Bias in Behaviorally Anchored Rating Scales: Global or scale-specific? Journal of Applied Psychology, 74, 343–346. doi:10.1037//0021-9010.74.2.343.
Paulhus, D. L., & Reynolds, S. (1995). Enhancing target variance in personality impressions: Highlighting the person in person perception. Journal of Personality and Social Psychology, 69, 1233–1242. doi:10.1037//0022-3514.69.6.1233.
Piotrowski, M. J., Barnes-Farrell, J. L., & Esrig, F. H. (1989). Behaviorally anchored bias: A replication and extension of Murphy and Constans. Journal of Applied Psychology, 74, 823–826. doi:10.1037//0021-9010.74.5.823.
Pulakos, E. D. (1984). A comparison of rater training programs: Error training and accuracy training. Journal of Applied Psychology, 69, 581–588. doi:10.1037//0021-9010.69.4.581.
Pulakos, E. D., Schmitt, N., & Ostroff, C. (1986). A warning about the use of a standard deviation across dimensions within ratees to measure halo. Journal of Applied Psychology, 71, 29–32. doi:10.1037//0021-9010.71.1.29.
Roch, S. G., Sternburg, A. M., & Caputo, P. M. (2007). Absolute vs. relative performance rating formats: Implications for fairness and organizational justice. International Journal of Selection and Assessment, 15, 302–316. doi:10.1111/j.1468-2389.2007.00390.x.
Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological sciences. American Psychologist, 44, 1276–1284. doi:10.1037//0003-066X.44.10.1276.
Singh, R., Simons, J. P., Young, D. P., Sim, B. S., Chai, X. T., Singh, S., & Chiou, S. Y. (2009). Trust and respect as mediators of the other- and self-profitable trait effects on interpersonal attraction. European Journal of Social Psychology, 39, 1021–1038. doi:10.1002/ejsp.605.
Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47, 149–155. doi:10.1037/h0047060.
Srull, T. K., & Wyer, R. S, Jr. (1979). The role of category accessibility in the interpretation of information about persons: Some determinants and implications. Journal of Personality and Social Psychology, 37, 1660–1672. doi:10.1037//0022-3514.37.10.1660.
Suitner, C., & Maass, A. (2008). The role of valence in the perception of agency and communion. European Journal of Social Psychology, 38, 1073–1082. doi:10.1002/ejsp.525.
Tziner, A., & Kopelman, R. E. (2002). Is there a preferred performance rating format? A non-psychometric perspective. Applied Psychology, 51, 479–503. doi:10.1111/1464-0597.00104.
Woehr, D. J., & Roch, S. (2012). Supervisory performance ratings. In N. Schmitt (Ed.), The Oxford handbook of personnel assessment and selection (pp. 517–531). Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780199732579.013.0022.
Wojciszke, B., Abele, A. E., & Baryla, W. (2009). Two dimensions of interpersonal attitudes: Liking depends on communion, respect depends on agency. European Journal of Social Psychology, 39, 973–990. doi:10.1002/ejsp.595.
Wong, K. F. E., & Kwong, J. Y. Y. (2007). Effects of rater goals on rating patterns: Evidence from an experimental field study. Journal of Applied Psychology, 92, 577–585. doi:10.1037/0021-9010.92.2.577.
Zebrowitz, L. A., & Collins, M. A. (1997). Accurate social perception at zero acquaintance: The affordances of a Gibsonian approach. Personality and Social Psychology Review, 1, 204–223. doi:10.1207/s15327957pspr0103_2.
Zedeck, S., & Cascio, W. F. (1982). Performance appraisal decisions as a function of rater training and purpose of the appraisal. Journal of Applied Psychology, 67, 752–758. doi:10.1037//0021-9010.67.6.752.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cambon, L., Steiner, D.D. When Rating Format Induces Different Rating Processes: The Effects of Descriptive and Evaluative Rating Modes on Discriminability and Accuracy. J Bus Psychol 30, 795–812 (2015). https://doi.org/10.1007/s10869-014-9389-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10869-014-9389-y