Skip to main content
Log in

Reconsidering the Equivalence of Multisource Performance Ratings: Evidence for the Importance and Meaning of Rater Factors

  • Published:
Journal of Business and Psychology Aims and scope Submit manuscript

Abstract

Purpose

The study specified an alternate model to examine the measurement invariance of multisource performance ratings (MSPRs) to systematically investigate the theoretical meaning of common method variance in the form of rater effects. As opposed to testing invariance based on a multigroup design with raters aggregated within sources, this study specified both performance dimension and idiosyncratic rater factors.

Design/Methodology/Approach

Data was obtained from 5,278 managers from a wide range of organizations and hierarchical levels, who were rated on the BENCHMARKS® MSPR instrument.

Findings

Our results diverged from prior research such that MSPRs were found to lack invariance for raters from different levels. However, same level raters provided equivalent ratings in terms of both the performance dimension loadings and rater factor loadings.

Implications

The results illustrate the importance of modeling rater factors when investigating invariance and suggest that rater factors reflect substantively meaningful variance, not bias.

Originality/Value

The current study applies an alternative model to examine invariance of MSPRs that allowed us to answer three questions that would not be possible with more traditional multigroup designs. First, the model allowed us to examine the impact of paramaterizing idiosyncratic rater factors on inferences of cross-rater invariance. Next, including multiple raters from each organizational level in the MSPR model allowed us to tease apart the degree of invariance in raters from the same source, relative to raters from different sources. Finally, our study allowed for inferences with respect to the invariance of idiosyncratic rater factors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Becker, T. E., & Cote, J. A. (1994). Additive and multiplicative method effects in applied psychological research: An empirical assessment of three models. Journal of Management, 20, 625–641.

    Google Scholar 

  • Blair, C. A., Hoffman, B. J., & Helland, K. R. (2008). Narcissism in organizations: A multisource appraisal reflects different perspectives. Human Performance, 21, 254–276.

    Article  Google Scholar 

  • Borman, W. C. (1974). The rating of individuals in organizations: An alternate approach. Organizational Behavior and Human Performance, 12, 105–124.

    Article  Google Scholar 

  • Borman, W. C. (1997). 360 ratings: An analysis of assumptions and research agenda for evaluating their validity. Human Resource Management Review, 7, 299–315.

    Article  Google Scholar 

  • Borman, W. C., & Brush, D. H. (1993). More progress toward a taxonomy of managerial performance requirements. Human Performance, 6, 1–21.

    Article  Google Scholar 

  • Burns, G. L., Walsh, J. A., & Gomez, R. (2003). Convergent and discriminant validity of trait and source effects in ADHD-inattention and hyperactivity/impulsivity measures across a 3-month interval. Journal of Abnormal Child Psychology, 31, 529–541.

    Article  PubMed  Google Scholar 

  • Campbell, J. P., McHenry, J. J., & Wise, L. L. (1990). Modeling job performance in a population of jobs. Personnel Psychology, 43, 313–333.

    Article  Google Scholar 

  • Carty, H. M. (2003). Review of BENCHMARKS® [revised]. In B. S. Plake, J. Impara, & R. A. Spies (Eds.), The fifteenth mental measurements yearbook (pp. 123–124). Lincoln, NE: Buros Institute of Mental Measurements.

    Google Scholar 

  • CCL. (2004). BENCHMARKS ® facilitator’s manual. Greensboro, NC: Center for Creative Leadership.

    Google Scholar 

  • Cheung, G. W. (1999). Multifaceted conceptions of self-other ratings disagreement. Personnel Psychology, 52, 1–36.

    Article  Google Scholar 

  • Church, A. H., & Allen, D. W. (1997). Advancing the state of the art of 360-degree feedback. Group & Organization Management, 22, 149–161.

    Article  Google Scholar 

  • Conway, J. M. (1996). Analysis and design of multitrait-multirater performance appraisal studies. Journal of Management, 22, 139–162.

    Article  Google Scholar 

  • Conway, J. M. (1999). Distinguishing contextual performance from task performance for managerial jobs. Journal of Applied Psychology, 84, 3–13.

    Article  Google Scholar 

  • Conway, J. M., & Huffcutt, A. I. (1997). Psychometric properties of multisource performance ratings: A meta-analysis of subordinate, supervisor, peer, and self- ratings. Human Performance, 10, 331–360.

    Article  Google Scholar 

  • Conway, J. M., Lievens, F., Scullen, S. E., & Lance, C. E. (2004). Bias in the correlated uniqueness model for MTMM data. Structural Equation Modeling, 11, 535–559.

    Article  Google Scholar 

  • Conway, J. M., Lombardo, K., & Sanders, K. C. (2001). A meta-analysis of incremental validity and nomological networks for subordinate and peer ratings. Human Performance, 14, 267–303.

    Article  Google Scholar 

  • Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90, 218–244.

    Article  Google Scholar 

  • Coovert, M. D., Craiger, J. P., & Teachout, M. S. (1997). Effectiveness of the direct product versus confirmatory factor model for reflecting the structure of multimethod-multirater job performance data. Journal of Applied Psychology, 82, 271–280.

    Article  PubMed  Google Scholar 

  • Dansereau, F., Graen, G., & Haga, W. (1975). A vertical dyad linkage approach to leadership within formal organizations: A longitudinal investigation of the role making process. Organizational Behavior and Human Performance, 13, 46–78.

    Article  Google Scholar 

  • Diefendorff, J. M., Silverman, S. B., & Greguras, G. J. (2005). Measurement equivalence and multisource ratings for non-managerial positions: Recommendations for research and practice. Journal of Business and Psychology, 19, 399–425.

    Article  Google Scholar 

  • Doty, D. H., & Glick, W. H. (1998). Common method bias: Does common methods variance really bias results? Organizational Research Methods, 1, 374–406.

    Article  Google Scholar 

  • Facteau, J. D., & Craig, S. B. (2001). Are performance appraisal ratings from different rating sources comparable? Journal of Applied Psychology, 86, 215–227.

    Article  PubMed  Google Scholar 

  • Fleenor, J. W., McCauley, C. D., & Brutus, S. (1996). Self-other rating agreement and leader effectiveness. Leadership Quarterly, 7, 487–506.

    Article  Google Scholar 

  • Gorman, C. A., & Rentsch, J. R. (2009). Evaluating frame-of-reference rater training effectiveness using performance schema accuracy. Journal of Applied Psychology, 94, 1336–1344.

    Article  PubMed  Google Scholar 

  • Hannum, K. M. (2007). Measurement equivalence of 360°-assessment data: Are different raters rating the same construct? International Journal of Selection and Assessment, 15, 293–301.

    Article  Google Scholar 

  • Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self-supervisor, self-peer, and peer-supervisor ratings. Personnel Psychology, 41, 43–62.

    Article  Google Scholar 

  • Hoffman, B. J., Gorman, A., Atchley, E. K., Blair, C., Meriac, J., & Overstreet, B. (in press).Evidence for the effectiveness of an alternate multi-source feedback measurement methodology. Personnel Psychology.

  • Hoffman, B. J., Lance, C. E., Bynum, B. H., & Gentry, W. A. (2010). Rater source effects are alive and well after all. Personnel Psychology, 63, 119–151.

    Article  Google Scholar 

  • Hoffman, B. J., & Woehr, D. J. (2009). Disentangling the meaning of multisource feedback: Expanding the nomological network surrounding source and dimension factors. Personnel Psychology, 62, 735–765.

    Article  Google Scholar 

  • Holzbach, R. L. (1978). Rater bias in performance ratings: Superior, self-, and peer ratings. Journal of Applied Psychology, 63, 579–588.

    Article  Google Scholar 

  • Hooijberg, R., & Choi, J. (2000). Which leadership roles matter to whom? An examination of ratter effects on perceptions of effectiveness. Leadership Quarterly, 11, 341–364.

    Article  Google Scholar 

  • James, L. R. (1988). Organizational climate: Another look at a potentially important construct. In S. G. Cole & R. G. Demaree (Eds.), Applications of interactionist psychology: Essays in honor of Saul B. Sells (pp. 253–282). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Jöreskog, K., & Sörbom, D. (2003). LISREL 8.54 [Coputer Software]. Chicago: Scientific Software International Inc.

    Google Scholar 

  • Kenny, D. A., & Berman, J. S. (1980). Statistical approaches to the correction of correlational bias. Psychological Bulletin, 88, 288–295.

    Article  Google Scholar 

  • Kenny, D. A., & Zautra, A. (2001). The trait-state models for longitudinal data. In L. M. Collins & A. G. Sayer (Eds.), New methods for the analysis of change (pp. 243–263). Washington, DC: American Psychological Association.

    Chapter  Google Scholar 

  • King, L. M., Hunter, J. E., & Schmidt, F. L. (1980). Halo in a multidimensional forced-choice performance evaluation scale. Journal of Applied Psychology, 65, 507–516.

    Article  Google Scholar 

  • Klimoski, R. J., & London, M. (1974). Role of the rater in performance appraisal. Journal of Applied Psychology, 59, 445–451.

    Article  Google Scholar 

  • Lance, C. E. (2008). Why assessment centers do not work the way they are supposed to. Industrial and Organizational Psychology: Perspective on Science and Practice, 1, 84–97.

    Article  Google Scholar 

  • Lance, C. E., Baranik, L. E., Lau, A. R., & Scharlau, E. A. (2008a). If it ain’t trait it must be method: (Mis)application of the multitrait-multimethod methodology in organizational research. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and methodological myths and urban legends: Received doctrine, verity, and fable in organizational and social research (pp. 337–360). Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Lance, C. E., Baxter, D., & Mahan, R. P. (2006). Multi-source performance measurement: A reconceptualization. In W. Bennett, C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future challenges (pp. 49–76). Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Lance, C. E., Dawson, B., Birkelbach, D., & Hoffman, B. J. (2010). Method effects, measurement error, and substantive conclusions. Organizational Research Methods, 13, 435–455.

    Article  Google Scholar 

  • Lance, C. E., Hoffman, B. J., Gentry, W. A., & Baranik, L. E. (2008b). Rater source factors represent important subcomponents of the criterion construct space, not rater bias. Human Resource Management Review, 18, 223–232.

    Article  Google Scholar 

  • Lance, C. E., Noble, C. L., & Scullen, S. E. (2002). A critique of the correlated trait-correlated method (CTCM) and correlated uniqueness (CU) models for multitrait-multimethod (MTMM) data. Psychological Methods, 7, 228–244.

    Article  PubMed  Google Scholar 

  • Lance, C. E., Teachout, M. S., & Donnelly, T. M. (1992). Specification of the criterion construct space: An application of hierarchical confirmatory factor analysis. Journal of Applied Psychology, 77, 437–452.

    Article  Google Scholar 

  • Lombardo, M. M., & McCauley, C. D. (1994). BENCHMARKS ® : A manual and trainer’s guide. Greensboro, NC: Center for Creative Leadership.

    Google Scholar 

  • Lombardo, M. M., McCauley, C. D., McDonald-Mann, D., & Leslie, J. B. (1999). BENCHMARKS ® developmental reference points. Greensboro, NC: Center for Creative Leadership.

    Google Scholar 

  • Lord, R. G., Foti, R. J., & De Vader, C. L. (1984). A test of leadership categorization theory: Internal structure, information processing, and leadership perceptions. Organizational Behavior and Human Performance, 34, 343–378.

    Article  Google Scholar 

  • Mabe, P., & West, J. (1982). Validity of self-evaluation of ability: A review and meta-analysis. Journal of Applied Psychology, 67, 280–296.

    Article  Google Scholar 

  • Maurer, T. J., Raju, N. S., & Collins, W. C. (1998). Peer and subordinate performance appraisal measurement equivalence. Journal of Applied Psychology, 83, 693–702.

    Article  Google Scholar 

  • McCauley, C., & Lombardo, M. (1990). BENCHMARKS®: An instrument for diagnosing managerial strengths and weaknesses. In K. E. Clark & M. B. Clark (Eds.), Measures of leadership (pp. 535–545). West Orange, NJ: Leadership Library of America.

    Google Scholar 

  • McCauley, C., Lombardo, M., & Usher, C. (1989). Diagnosing management development needs: An instrument based on how managers develop. Journal of Management, 15, 389–403.

    Article  Google Scholar 

  • Mount, M. K., Judge, T. A., Scullen, S. E., Sytsma, M. R., & Hezlett, S. A. (1998). Trait, rater, and level effects in 360-degree performance ratings. Personnel Psychology, 51, 557–576.

    Article  Google Scholar 

  • Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology: Perspectives on Science and Practices, 1, 148–160.

    Article  Google Scholar 

  • Nathan, B. R., & Tippins, N. (1990). The consequences of halo ‘error’ in performance ratings: A field study of the moderating effect of halo on test validation results. Journal of Applied Psychology, 75, 290–296.

    Article  Google Scholar 

  • Newcomb, T. (1931). An experiment designed to test the validity of a rating technique. Journal of Educational Psychology, 22, 279–289.

    Article  Google Scholar 

  • Oh, I. S., & Berry, C. M. (2009). The five-factor model of personality and managerial performance: Validity gains through the use of 360 degree performance ratings. Journal of Applied Psychology, 94, 1498–1513.

    Article  PubMed  Google Scholar 

  • Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903.

    Article  PubMed  Google Scholar 

  • Putka, D. J., Lance, C. E., Le, H., & McCloy, R. A. (2011). A cautionary note on modeling disaggregated multitrait-multirater data: Selection of raters can really matter! Organizational Research Methods, 14, 503–529.

    Article  Google Scholar 

  • Rigdon, E. E. (1995). A necessary and sufficient identification rule for structural models. Estimated in practice. Multivariate Behavioral Research, 30, 359–383.

    Article  Google Scholar 

  • Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88, 413–428.

    Article  Google Scholar 

  • Schermelleh-Engel, K., Keith, N., Moosbrugger, H., & Hodapp, V. (2004). Decomposing person and occasion-specific effects: An extension of Latent State-Trait (LSI) theory to hierarchical LST models. Psychological Methods, 9, 198–219.

    Article  PubMed  Google Scholar 

  • Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85, 956–970.

    Article  PubMed  Google Scholar 

  • Smith, G., & Smith, J. (2005). Regression to the mean in average test scores. Educational Assessment, 10, 377–399.

    Article  Google Scholar 

  • Spangler, M. (2003). Review of BENCHMARKS® [revised]. In B. S. Plake, J. Impara, & R. A. Spies (Eds.), The fifteenth mental measurements yearbook (pp. 124–126). Lincoln, NE: Buros Institute of Mental Measurements.

    Google Scholar 

  • Sparrowe, R. T., & Liden, R. C. (1997). Process and structure in leader–member exchange. Academy of Management Review, 22, 522–552.

    Google Scholar 

  • Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4, 25–29.

    Article  Google Scholar 

  • Vance, R. J., MacCallum, R. C., Coovert, M. D., & Hedge, J. W. (1988). Construct validity of multiple job performance measures using confirmatory factor analysis. Journal of Applied Psychology, 73, 74–80.

    Article  Google Scholar 

  • Vandenberg, R. J. (2002). Toward a further understand of and improvement in measurement invariance methods and procedures. Organizational Research Methods, 5, 139–158.

    Article  Google Scholar 

  • Vandenberg, R. J., & Lance, C. E. (2000). Review and synthesis of the measurement invariance literature: Suggestions, practices and recommendations for organizational research. Organizational Research Methods, 3, 4–70.

    Article  Google Scholar 

  • Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2002). The moderating influence of job performance dimension on convergence of supervisory and peer ratings of job performance: Unconfounding construct-level convergence and rating difficulty. Journal of Applied Psychology, 87, 245–354.

    Article  Google Scholar 

  • Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general factor in ratings of job performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90, 108–131.

    Article  PubMed  Google Scholar 

  • Wells, F. L. (1907). A statistical study of literary merit. (Columbia Univ. Cont. to Phil. & Psych., 16, 3.). Archives of Psychology, 7.

  • Wherry, R. J., & Bartlett, C. J. (1982). The control of bias in ratings: A theory of rating. Personnel Psychology, 35, 521–551.

    Article  Google Scholar 

  • Woehr, D. J., Sheehan, M. K., & Bennett, W. (2005). Assessing measurement equivalence across ratings sources: A multitrait-multirater approach. Journal of Applied Psychology, 90, 592–600.

    Article  PubMed  Google Scholar 

  • Zedeck, S. (1995). Review of BENCHMARKS®. In J. Conoley & J. Impara (Eds.), The twelfth mental measurements yearbook (Vol. 1, pp. 128–129). Lincoln, NE: Buros Institute of Mental Measurements.

    Google Scholar 

  • Zedeck, S., & Baker, H. T. (1972). Nursing performance as measured by behavioral expectation sales: A multitrait-multirater analysis. Organizational Behavior and Human Performance, 7, 457–466.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bethany H. Bynum.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bynum, B.H., Hoffman, B.J., Meade, A.W. et al. Reconsidering the Equivalence of Multisource Performance Ratings: Evidence for the Importance and Meaning of Rater Factors. J Bus Psychol 28, 203–219 (2013). https://doi.org/10.1007/s10869-012-9272-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10869-012-9272-7

Keywords

Navigation