Abstract
Purpose
The purpose of this study was to take an inductive approach in examining the extent to which organizational contexts represent significant sources of variance in supervisor performance ratings, and to explore various factors that may explain contextual rating variability.
Design/Methodology/Approach
Using archival field performance rating data from a large state law enforcement organization, we used a multilevel modeling approach to partition the variance in ratings due to ratees, raters, as well as rating contexts.
Findings
Results suggest that much of what may often be interpreted as idiosyncratic rater variance, may actually reflect systematic rating variability across contexts. In addition, performance-related and non-performance factors including contextual rating tendencies accounted for significant rating variability.
Implications
Supervisor ratings represent the most common approach for measuring job performance, and understanding the nature and sources of rating variability is important for research and practice. Given the many uses of performance rating data, our findings suggest that continuing to identify contextual sources of variability is particularly important for addressing criterion problems, and improving ratings as a form of performance measurement.
Originality/Value
Numerous performance appraisal models suggest the importance of context; however, previous research had not partitioned the variance in supervisor ratings due to omnibus context effects in organizational settings. The use of a multilevel modeling approach allowed the examination of contextual influences, while controlling for ratee and rater characteristics.
Similar content being viewed by others
References
Adler, S., Campion, M., Colquitt, A., Grubb, A., Murphy, K. R., Ollander-Krane, R., et al. (in press). Getting rid of performance ratings: Genius or folly? A debate. Industrial and Organizational Psychology: Perspectives on Science and Practice.
Aguinis, H., Gottfredson, R. K., & Culpepper, S. A. (2013). Best-practice recommendations for estimating cross-level interaction effects using multilevel modeling. Journal of Management, 39, 1490–1528. doi:10.1177/0149206313478188.
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–211. doi:10.1016/0749-5978(91)90020-T.
Ajzen, I., & Fishbein, M. (2005). The influence of attitudes on behavior. In D. Albarracín, B. T. Johnson, & M. P. Zanna (Eds.), The handbook of attitudes (pp. 173–221). Mahwah, NJ: Lawrence Erlbaum Associates.
Austin, J. T., & Crespin, T. R. (2006). Problems of criteria in industrial and organizational psychology: Progress, pitfalls, and prospects. In W. Bennett Jr, C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future challenges (pp. 9–48). Mahwah, NJ: Lawrence Erlbaum Associates.
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917-1992. Journal of Applied Psychology, 77, 836–874.
Bartko, J. J. (1976). On various intraclass correlation reliability coefficients. Psychological Bulletin, 83, 762–765.
Bennett, W, Jr, Lance, C. E., & Woehr, D. J. (2006). Introduction. In W. Bennett Jr, C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future challenges (pp. 1–5). Mahwah, NJ: Lawrence Erlbaum Associates.
Bernardin, H. J., & Buckley, R. B. (1981). Strategies in rater training. Academy of Management Review, 6, 205–212.
Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions (pp. 349–381). San Francisco, CA: Jossey-Bass.
Bliese, P. D., & Hanges, P. J. (2004). Being both too liberal and too conservative: The perils of treating grouped data as though they were independent. Organizational Research Methods, 7, 400–417. doi:10.1177/1094428104268542.
Bommer, W. H., Johnson, J., Rich, G. A., Podsakoff, P. M., & MacKenzie, S. B. (1995). On the interchangeability of objective and subjective measures of employee performance: A meta-analysis. Personnel Psychology, 48, 587–605. doi:10.1111/j.1744-6570.1995.tb01772.x.
Borman, W. C. (1987). Personal constructs, performance schemata, and “folk theories” of subordinate effectiveness: Explorations in an Army officer sample. Organizational Behavior and Human Decision Processes, 40, 307–322.
Borman, W. C. (2004). The concept of organizational citizenship. Current Directions in Psychological Science, 13, 238–241.
Borman, W. C., Buck, D. E., Motowildo, S. J., Hanson, M. A., Stark, S., & Drasgow, F. (2001). An examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales. Journal of Applied Psychology, 86, 965–973. doi:10.1037//0021-9010.86.5.965.
Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. (1993). A theory of performance. In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organizations (pp. 35–70). San Francisco, CA: Jossey-Bass.
Deadrick, D. L., & Gardner, D. G. (1997). Distributional ratings of performance levels and variability. Group and Organization Management, 22, 317–342.
DeCotiis, T., & Petit, A. (1978). The performance appraisal process: A model and some testable propositions. Academy of Management Review, 3, 635–646.
DeNisi, A. S., Cafferty, T. P., & Meglino, B. M. (1984). A cognitive view of the performance appraisal process: A model and research propositions. Organizational Behavior & Human Performance, 33, 360–396.
Dierdorff, E. C., Rubin, R. S., & Morgeson, F. P. (2009). The milieu of managerial work: an integrative framework linking work context to role requirements. Journal of Applied Psychology, 94, 972.
Dierdorff, E. C., & Surface, E. A. (2007). Placing peer ratings in context: Systematic influences beyond ratee performance. Personnel Psychology, 60, 93–126. doi:10.1111/j.1744-6570.2007.00066.x.
Elsbach, K. D., & Pratt, M. G. (2008). The physical environment in organizations. In J. P. Walsh & A. P. Brief (Eds.), The academy of management annals (Vol. 1, pp. 181–224). New York: Taylor & Francis Group/Lawrence Erlbaum Associates.
Goffin, R. D., Jelley, R. B., Powell, D. M., & Johnston, N. G. (2009). Taking advantage of social comparisons in performance appraisal: The relative percentile method. Human Resource Management, 48, 251–268. doi:10.1002/hrm.20278.
Greguras, G. J., Robie, C., Schleicher, D. J., & Goff, M, I. I. I. (2003). A field study of the effects of rating purpose on the quality of multisource ratings. Personnel Psychology, 56, 1–21.
Harris, M. M. (1994). Rater motivation in the performance appraisal context: A theoretical framework. Journal of Management, 20, 737–756.
Hattrup, K., & Jackson, S. (1996). Learning about individual differences by taking situations seriously. In K. R. Murphy (Ed.), Individual differences and behavior in organizations (pp. 507–547). San-Francisco: Jossey-Bass.
Hauenstein, N. M. A. (1992). An information-processing approach to leniency in performance judgments. Journal of Applied Psychology, 77, 485.
Heneman, R. L. (1986). The relationship between supervisory ratings and results-oriented measures of performance: A meta-analysis. Personnel Psychology, 39, 811–826.
Hoffman, B. J., Gorman, C. A., Blair, C. A., Meriac, J. P., Overstreet, B., & Atchley, E. K. (2012). Evidence for the effectiveness of an alternative multisource performance rating methodology. Personnel Psychology, 65, 531–563. doi:10.1111/j.1744-6570.2012.01252.x.
Hoffman, B., Lance, C. E., Bynum, B., & Gentry, W. A. (2010). Rater source effects are alive and well after all. Personnel Psychology, 63, 119–151. doi:10.1111/j.1744-6570.2009.01164.x.
Hofmann, D. A., & Gavin, M. B. (1998). Centering decisions in hierarchical linear models: Implications for research in organizations. Journal of Management, 24, 623–641.
Ilgen, D. R., Barnes-Farrell, J. L., & McKellin, D. B. (1993). Performance appraisal process research in the 1980s: What has it contributed to appraisals in use? Organizational Behavior and Human Decision Processes, 54, 321–368.
Ilgen, D. R., & Feldman, J. M. (1983). Performance appraisal: A process focus. In L. Cummings & B. Staw (Eds.), Research in organizational behavior (Vol. 5, pp. 141–197). Greenwich, CT: JAI Press.
Jawahar, I. M., & Williams, C. R. (1997). Where all the children are above average: The performance appraisal purpose effect. Personnel Psychology, 50, 905–925.
Johns, G. (2006). The essential impact of context on organizational behavior. Academy of Management Review, 31, 386–408.
Judge, T. A., & Ferris, G. R. (1993). Social context of performance evaluation decisions. Academy of Management Journal, 36, 80–105.
Kane, J. S., Bernardin, H. J., Villanova, P., & Peyrefitte, J. (1995). Stability of rater leniency: Three studies. Academy of Management Journal, 38, 1036–1051.
Kingstrom, P. O., & Mainstone, L. E. (1985). An investigation of the rater–ratee acquaintance and rater bias. Academy of Management Journal, 28, 641–653. doi:10.2307/256119.
Klores, M. S. (1966). Rater bias in forced-distribution performance ratings. Personnel Psychology, 19, 411–421.
Kozlowski, S. W. J., Kirsch, M. P., & Chao, G. T. (1986). Job knowledge, ratee familiarity, conceptual similarity and halo error: An exploration. Journal of Applied Psychology, 71, 45–49.
LaHuis, D. M., & Avis, J. M. (2007). Using multilevel random coefficient modeling to investigate rater effects in performance ratings. Organizational Research Methods, 10, 97–107.
Landy, F. (2010). Performance ratings: Then and now. In J. L. Outtz (Ed.), Adverse impact: Implications for organizational staffing and high stakes selection (pp. 227–248). New York: Routledge/Taylor & Francis Group.
Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72–107.
Levy, P. E., & Williams, J. R. (2004). The social context of performance appraisal: A review and framework for the future. Journal of Management, 30, 881–905.
McDaniel, M. A., Schmidt, F. L., & Hunter, J. E. (1988). Job experience correlates of job performance. Journal of Applied Psychology, 73, 327–330.
Mero, N. P., Motowidlo, S. J., & Anna, A. L. (2003). Effects of accountability on rating behavior and rater accuracy. Journal of Applied Social Psychology, 33, 2493–2514.
Mount, M. K., Judge, T. A., Scullen, S. E., Sytsma, M. R., & Hezlett, S. A. (1998). Trait, rater and level effects in 360-degree performance ratings. Personnel Psychology, 51, 557–576.
Mowday, R. T., & Sutton, R. I. (1993). Organizational behavior: Linking individuals and groups to organizational contexts. Annual Review of Psychology, 44, 195–229. doi:10.1146/annurev.ps.44.020193.001211.
Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 148–160. doi:10.1111/j.1754-9434.2008.00030.x.
Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. Thousand Oaks, CA: Sage Publications.
Murphy, K. R., Cleveland, J. N., Kinney, T. B., Skattebo, A. L., Newman, D. A., & Sin, H. P. (2003). Unit climate, rater goals and performance ratings in an instructional setting. Irish Journal of Management, 24, 48.
Murphy, K. R., & DeShon, R. (2000). Interrater correlations do not estimate the reliability of job performance ratings. Personnel Psychology, 53, 873–900.
O’Neill, T. A., Goffin, R. D., & Gellatly, I. R. (2012). The use of random coefficient modeling for understanding and predicting job performance ratings: An application with field data. Organizational Research Methods, 15, 436–462. doi:10.1177/1094428112438699.
Oldham, G. R., Kulik, C. T., & Stepina, L. P. (1991). Physical environments and employee reactions: Effects of stimulus-screening skills and job complexity. Academy of Management Journal, 34, 929–938. doi:10.2307/256397.
Peters, L. H., & O’Connor, E. J. (1980). Situational constraints and work outcomes: The influences of a frequently overlooked construct. Academy of Management Review, 5, 391–398. doi:10.5465/AMR.1980.4288856.
Putka, D. J., Ingerick, M., & McCloy, R. A. (2008). Integrating traditional perspectives on error in ratings: Capitalizing on advances in mixed-effects modeling. Industrial & Organizational Psychology, 1, 167–173. doi:10.1111/j.1754-9434.2008.00032.x.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage Publications.
Raudenbush, S. W., Bryk, A. S., Cheong, Y. F., Congdon, R. T., & du Toit, M. (2011). HLM 7. Lincolnwood, IL: Scientific Software International.
Reb, J., & Cropanzano, R. (2007). Evaluating dynamic performance: The influence of salient gestalt characteristics on performance ratings. Journal of Applied Psychology, 92, 490–499.
Reb, J., & Greguras, G. J. (2010). Understanding performance ratings: Dynamic performance, attributions, and rating purpose. Journal of Applied Psychology, 95, 213–220.
Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U. (2012). Rater training revisited: An updated meta-analytic review of frame-of-reference training. Journal of Occupational & Organizational Psychology, 85, 370–395. doi:10.1111/j.2044-8325.2011.02045.x.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262–274. doi:10.1037/0033-2909.124.2.262.
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85, 956–970.
Shore, T. H., & Tashchian, A. (2002). Accountability forces in performance appraisal: Effects of self-appraisal information, normative information, and task performance. Journal of Business and Psychology, 17, 261–274. doi:10.1023/A:1019689616654.
Spence, J. R., & Keeping, L. M. (2010). The impact of non-performance information on ratings of job performance: A policy-capturing approach. Journal of Organizational Behavior, 31, 587–608.
Spence, J. R., & Keeping, L. (2011). Conscious rating distortion in performance appraisal: A review, commentary, and proposed framework for research. Human Resource Management Review, 21, 85–95. doi:10.1016/j.hrmr.2010.09.013.
Spence, J. R., & Keeping, L. M. (2013). The road to performance ratings is paved with intentions: A framework for understanding managers’ intentions when rating employee performance. Organizational Psychology Review, 3, 360–383. doi:10.1177/2041386613485969.
Tesluk, P. E., & Jacobs, R. R. (1998). Toward an integrated model of work experience. Personnel Psychology, 51, 321–355.
Waldman, D. A., Yammarino, F. J., & Avolio, B. J. (1990). A multiple level investigation of personnel ratings. Personnel Psychology, 43, 811–835.
Wherry, R. J., & Bartlett, C. J. (1982). The control of bias in ratings: A theory of rating. Personnel Psychology, 35, 521–551.
Woehr, D. J., & Roch, S. (2012). Supervisory performance ratings. In N. Schmitt (Ed.), The Oxford handbook of personnel assessment and selection (pp. 517–531). New York: Oxford University Press.
Zalesny, M. D., & Highhouse, S. (1992). Accuracy in performance evaluations. Organizational Behavior and Human Decision Processes, 51, 22–50.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ellington, J.K., Wilson, M.A. The Performance Appraisal Milieu: A Multilevel Analysis of Context Effects in Performance Ratings. J Bus Psychol 32, 87–100 (2017). https://doi.org/10.1007/s10869-016-9437-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10869-016-9437-x