Reconsidering the Equivalence of Multisource Performance Ratings: Evidence for the Importance and Meaning of Rater Factors

Bynum, Bethany H.; Hoffman, Brian J.; Meade, Adam W.; Gentry, William A.

doi:10.1007/s10869-012-9272-7

Reconsidering the Equivalence of Multisource Performance Ratings: Evidence for the Importance and Meaning of Rater Factors

Published: 31 July 2012

Volume 28, pages 203–219, (2013)
Cite this article

Journal of Business and Psychology Aims and scope Submit manuscript

Bethany H. Bynum¹,
Brian J. Hoffman²,
Adam W. Meade³ &
…
William A. Gentry⁴

652 Accesses
7 Citations
Explore all metrics

Abstract

Purpose

The study specified an alternate model to examine the measurement invariance of multisource performance ratings (MSPRs) to systematically investigate the theoretical meaning of common method variance in the form of rater effects. As opposed to testing invariance based on a multigroup design with raters aggregated within sources, this study specified both performance dimension and idiosyncratic rater factors.

Design/Methodology/Approach

Data was obtained from 5,278 managers from a wide range of organizations and hierarchical levels, who were rated on the BENCHMARKS^® MSPR instrument.

Findings

Our results diverged from prior research such that MSPRs were found to lack invariance for raters from different levels. However, same level raters provided equivalent ratings in terms of both the performance dimension loadings and rater factor loadings.

Implications

The results illustrate the importance of modeling rater factors when investigating invariance and suggest that rater factors reflect substantively meaningful variance, not bias.

Originality/Value

The current study applies an alternative model to examine invariance of MSPRs that allowed us to answer three questions that would not be possible with more traditional multigroup designs. First, the model allowed us to examine the impact of paramaterizing idiosyncratic rater factors on inferences of cross-rater invariance. Next, including multiple raters from each organizational level in the MSPR model allowed us to tease apart the degree of invariance in raters from the same source, relative to raters from different sources. Finally, our study allowed for inferences with respect to the invariance of idiosyncratic rater factors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

I Only Have One Rater Per Ratee, So What? The Impact of Clustered Performance Rating Data on Operational Validity Estimates

Article 28 November 2019

The Performance Appraisal Milieu: A Multilevel Analysis of Context Effects in Performance Ratings

Article 10 February 2016

Estimating Group-Level Relationships: General Recommendations and Considerations for the Use of Intraclass Correlation Coefficients

Article 26 July 2016

References

Becker, T. E., & Cote, J. A. (1994). Additive and multiplicative method effects in applied psychological research: An empirical assessment of three models. Journal of Management, 20, 625–641.
Google Scholar
Blair, C. A., Hoffman, B. J., & Helland, K. R. (2008). Narcissism in organizations: A multisource appraisal reflects different perspectives. Human Performance, 21, 254–276.
Article Google Scholar
Borman, W. C. (1974). The rating of individuals in organizations: An alternate approach. Organizational Behavior and Human Performance, 12, 105–124.
Article Google Scholar
Borman, W. C. (1997). 360 ratings: An analysis of assumptions and research agenda for evaluating their validity. Human Resource Management Review, 7, 299–315.
Article Google Scholar
Borman, W. C., & Brush, D. H. (1993). More progress toward a taxonomy of managerial performance requirements. Human Performance, 6, 1–21.
Article Google Scholar
Burns, G. L., Walsh, J. A., & Gomez, R. (2003). Convergent and discriminant validity of trait and source effects in ADHD-inattention and hyperactivity/impulsivity measures across a 3-month interval. Journal of Abnormal Child Psychology, 31, 529–541.
Article PubMed Google Scholar
Campbell, J. P., McHenry, J. J., & Wise, L. L. (1990). Modeling job performance in a population of jobs. Personnel Psychology, 43, 313–333.
Article Google Scholar
Carty, H. M. (2003). Review of BENCHMARKS^® [revised]. In B. S. Plake, J. Impara, & R. A. Spies (Eds.), The fifteenth mental measurements yearbook (pp. 123–124). Lincoln, NE: Buros Institute of Mental Measurements.
Google Scholar
CCL. (2004). BENCHMARKS ^® facilitator’s manual. Greensboro, NC: Center for Creative Leadership.
Google Scholar
Cheung, G. W. (1999). Multifaceted conceptions of self-other ratings disagreement. Personnel Psychology, 52, 1–36.
Article Google Scholar
Church, A. H., & Allen, D. W. (1997). Advancing the state of the art of 360-degree feedback. Group & Organization Management, 22, 149–161.
Article Google Scholar
Conway, J. M. (1996). Analysis and design of multitrait-multirater performance appraisal studies. Journal of Management, 22, 139–162.
Article Google Scholar
Conway, J. M. (1999). Distinguishing contextual performance from task performance for managerial jobs. Journal of Applied Psychology, 84, 3–13.
Article Google Scholar
Conway, J. M., & Huffcutt, A. I. (1997). Psychometric properties of multisource performance ratings: A meta-analysis of subordinate, supervisor, peer, and self- ratings. Human Performance, 10, 331–360.
Article Google Scholar
Conway, J. M., Lievens, F., Scullen, S. E., & Lance, C. E. (2004). Bias in the correlated uniqueness model for MTMM data. Structural Equation Modeling, 11, 535–559.
Article Google Scholar
Conway, J. M., Lombardo, K., & Sanders, K. C. (2001). A meta-analysis of incremental validity and nomological networks for subordinate and peer ratings. Human Performance, 14, 267–303.
Article Google Scholar
Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90, 218–244.
Article Google Scholar
Coovert, M. D., Craiger, J. P., & Teachout, M. S. (1997). Effectiveness of the direct product versus confirmatory factor model for reflecting the structure of multimethod-multirater job performance data. Journal of Applied Psychology, 82, 271–280.
Article PubMed Google Scholar
Dansereau, F., Graen, G., & Haga, W. (1975). A vertical dyad linkage approach to leadership within formal organizations: A longitudinal investigation of the role making process. Organizational Behavior and Human Performance, 13, 46–78.
Article Google Scholar
Diefendorff, J. M., Silverman, S. B., & Greguras, G. J. (2005). Measurement equivalence and multisource ratings for non-managerial positions: Recommendations for research and practice. Journal of Business and Psychology, 19, 399–425.
Article Google Scholar
Doty, D. H., & Glick, W. H. (1998). Common method bias: Does common methods variance really bias results? Organizational Research Methods, 1, 374–406.
Article Google Scholar
Facteau, J. D., & Craig, S. B. (2001). Are performance appraisal ratings from different rating sources comparable? Journal of Applied Psychology, 86, 215–227.
Article PubMed Google Scholar
Fleenor, J. W., McCauley, C. D., & Brutus, S. (1996). Self-other rating agreement and leader effectiveness. Leadership Quarterly, 7, 487–506.
Article Google Scholar
Gorman, C. A., & Rentsch, J. R. (2009). Evaluating frame-of-reference rater training effectiveness using performance schema accuracy. Journal of Applied Psychology, 94, 1336–1344.
Article PubMed Google Scholar
Hannum, K. M. (2007). Measurement equivalence of 360°-assessment data: Are different raters rating the same construct? International Journal of Selection and Assessment, 15, 293–301.
Article Google Scholar
Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self-supervisor, self-peer, and peer-supervisor ratings. Personnel Psychology, 41, 43–62.
Article Google Scholar
Hoffman, B. J., Gorman, A., Atchley, E. K., Blair, C., Meriac, J., & Overstreet, B. (in press).Evidence for the effectiveness of an alternate multi-source feedback measurement methodology. Personnel Psychology.
Hoffman, B. J., Lance, C. E., Bynum, B. H., & Gentry, W. A. (2010). Rater source effects are alive and well after all. Personnel Psychology, 63, 119–151.
Article Google Scholar
Hoffman, B. J., & Woehr, D. J. (2009). Disentangling the meaning of multisource feedback: Expanding the nomological network surrounding source and dimension factors. Personnel Psychology, 62, 735–765.
Article Google Scholar
Holzbach, R. L. (1978). Rater bias in performance ratings: Superior, self-, and peer ratings. Journal of Applied Psychology, 63, 579–588.
Article Google Scholar
Hooijberg, R., & Choi, J. (2000). Which leadership roles matter to whom? An examination of ratter effects on perceptions of effectiveness. Leadership Quarterly, 11, 341–364.
Article Google Scholar
James, L. R. (1988). Organizational climate: Another look at a potentially important construct. In S. G. Cole & R. G. Demaree (Eds.), Applications of interactionist psychology: Essays in honor of Saul B. Sells (pp. 253–282). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Jöreskog, K., & Sörbom, D. (2003). LISREL 8.54 [Coputer Software]. Chicago: Scientific Software International Inc.
Google Scholar
Kenny, D. A., & Berman, J. S. (1980). Statistical approaches to the correction of correlational bias. Psychological Bulletin, 88, 288–295.
Article Google Scholar
Kenny, D. A., & Zautra, A. (2001). The trait-state models for longitudinal data. In L. M. Collins & A. G. Sayer (Eds.), New methods for the analysis of change (pp. 243–263). Washington, DC: American Psychological Association.
Chapter Google Scholar
King, L. M., Hunter, J. E., & Schmidt, F. L. (1980). Halo in a multidimensional forced-choice performance evaluation scale. Journal of Applied Psychology, 65, 507–516.
Article Google Scholar
Klimoski, R. J., & London, M. (1974). Role of the rater in performance appraisal. Journal of Applied Psychology, 59, 445–451.
Article Google Scholar
Lance, C. E. (2008). Why assessment centers do not work the way they are supposed to. Industrial and Organizational Psychology: Perspective on Science and Practice, 1, 84–97.
Article Google Scholar
Lance, C. E., Baranik, L. E., Lau, A. R., & Scharlau, E. A. (2008a). If it ain’t trait it must be method: (Mis)application of the multitrait-multimethod methodology in organizational research. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and methodological myths and urban legends: Received doctrine, verity, and fable in organizational and social research (pp. 337–360). Mahwah, NJ: Erlbaum.
Google Scholar
Lance, C. E., Baxter, D., & Mahan, R. P. (2006). Multi-source performance measurement: A reconceptualization. In W. Bennett, C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future challenges (pp. 49–76). Mahwah, NJ: Erlbaum.
Google Scholar
Lance, C. E., Dawson, B., Birkelbach, D., & Hoffman, B. J. (2010). Method effects, measurement error, and substantive conclusions. Organizational Research Methods, 13, 435–455.
Article Google Scholar
Lance, C. E., Hoffman, B. J., Gentry, W. A., & Baranik, L. E. (2008b). Rater source factors represent important subcomponents of the criterion construct space, not rater bias. Human Resource Management Review, 18, 223–232.
Article Google Scholar
Lance, C. E., Noble, C. L., & Scullen, S. E. (2002). A critique of the correlated trait-correlated method (CTCM) and correlated uniqueness (CU) models for multitrait-multimethod (MTMM) data. Psychological Methods, 7, 228–244.
Article PubMed Google Scholar
Lance, C. E., Teachout, M. S., & Donnelly, T. M. (1992). Specification of the criterion construct space: An application of hierarchical confirmatory factor analysis. Journal of Applied Psychology, 77, 437–452.
Article Google Scholar
Lombardo, M. M., & McCauley, C. D. (1994). BENCHMARKS ^® : A manual and trainer’s guide. Greensboro, NC: Center for Creative Leadership.
Google Scholar
Lombardo, M. M., McCauley, C. D., McDonald-Mann, D., & Leslie, J. B. (1999). BENCHMARKS ^® developmental reference points. Greensboro, NC: Center for Creative Leadership.
Google Scholar
Lord, R. G., Foti, R. J., & De Vader, C. L. (1984). A test of leadership categorization theory: Internal structure, information processing, and leadership perceptions. Organizational Behavior and Human Performance, 34, 343–378.
Article Google Scholar
Mabe, P., & West, J. (1982). Validity of self-evaluation of ability: A review and meta-analysis. Journal of Applied Psychology, 67, 280–296.
Article Google Scholar
Maurer, T. J., Raju, N. S., & Collins, W. C. (1998). Peer and subordinate performance appraisal measurement equivalence. Journal of Applied Psychology, 83, 693–702.
Article Google Scholar
McCauley, C., & Lombardo, M. (1990). BENCHMARKS^®: An instrument for diagnosing managerial strengths and weaknesses. In K. E. Clark & M. B. Clark (Eds.), Measures of leadership (pp. 535–545). West Orange, NJ: Leadership Library of America.
Google Scholar
McCauley, C., Lombardo, M., & Usher, C. (1989). Diagnosing management development needs: An instrument based on how managers develop. Journal of Management, 15, 389–403.
Article Google Scholar
Mount, M. K., Judge, T. A., Scullen, S. E., Sytsma, M. R., & Hezlett, S. A. (1998). Trait, rater, and level effects in 360-degree performance ratings. Personnel Psychology, 51, 557–576.
Article Google Scholar
Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology: Perspectives on Science and Practices, 1, 148–160.
Article Google Scholar
Nathan, B. R., & Tippins, N. (1990). The consequences of halo ‘error’ in performance ratings: A field study of the moderating effect of halo on test validation results. Journal of Applied Psychology, 75, 290–296.
Article Google Scholar
Newcomb, T. (1931). An experiment designed to test the validity of a rating technique. Journal of Educational Psychology, 22, 279–289.
Article Google Scholar
Oh, I. S., & Berry, C. M. (2009). The five-factor model of personality and managerial performance: Validity gains through the use of 360 degree performance ratings. Journal of Applied Psychology, 94, 1498–1513.
Article PubMed Google Scholar
Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903.
Article PubMed Google Scholar
Putka, D. J., Lance, C. E., Le, H., & McCloy, R. A. (2011). A cautionary note on modeling disaggregated multitrait-multirater data: Selection of raters can really matter! Organizational Research Methods, 14, 503–529.
Article Google Scholar
Rigdon, E. E. (1995). A necessary and sufficient identification rule for structural models. Estimated in practice. Multivariate Behavioral Research, 30, 359–383.
Article Google Scholar
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88, 413–428.
Article Google Scholar
Schermelleh-Engel, K., Keith, N., Moosbrugger, H., & Hodapp, V. (2004). Decomposing person and occasion-specific effects: An extension of Latent State-Trait (LSI) theory to hierarchical LST models. Psychological Methods, 9, 198–219.
Article PubMed Google Scholar
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85, 956–970.
Article PubMed Google Scholar
Smith, G., & Smith, J. (2005). Regression to the mean in average test scores. Educational Assessment, 10, 377–399.
Article Google Scholar
Spangler, M. (2003). Review of BENCHMARKS^® [revised]. In B. S. Plake, J. Impara, & R. A. Spies (Eds.), The fifteenth mental measurements yearbook (pp. 124–126). Lincoln, NE: Buros Institute of Mental Measurements.
Google Scholar
Sparrowe, R. T., & Liden, R. C. (1997). Process and structure in leader–member exchange. Academy of Management Review, 22, 522–552.
Google Scholar
Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4, 25–29.
Article Google Scholar
Vance, R. J., MacCallum, R. C., Coovert, M. D., & Hedge, J. W. (1988). Construct validity of multiple job performance measures using confirmatory factor analysis. Journal of Applied Psychology, 73, 74–80.
Article Google Scholar
Vandenberg, R. J. (2002). Toward a further understand of and improvement in measurement invariance methods and procedures. Organizational Research Methods, 5, 139–158.
Article Google Scholar
Vandenberg, R. J., & Lance, C. E. (2000). Review and synthesis of the measurement invariance literature: Suggestions, practices and recommendations for organizational research. Organizational Research Methods, 3, 4–70.
Article Google Scholar
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2002). The moderating influence of job performance dimension on convergence of supervisory and peer ratings of job performance: Unconfounding construct-level convergence and rating difficulty. Journal of Applied Psychology, 87, 245–354.
Article Google Scholar
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general factor in ratings of job performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90, 108–131.
Article PubMed Google Scholar
Wells, F. L. (1907). A statistical study of literary merit. (Columbia Univ. Cont. to Phil. & Psych., 16, 3.). Archives of Psychology, 7.
Wherry, R. J., & Bartlett, C. J. (1982). The control of bias in ratings: A theory of rating. Personnel Psychology, 35, 521–551.
Article Google Scholar
Woehr, D. J., Sheehan, M. K., & Bennett, W. (2005). Assessing measurement equivalence across ratings sources: A multitrait-multirater approach. Journal of Applied Psychology, 90, 592–600.
Article PubMed Google Scholar
Zedeck, S. (1995). Review of BENCHMARKS^®. In J. Conoley & J. Impara (Eds.), The twelfth mental measurements yearbook (Vol. 1, pp. 128–129). Lincoln, NE: Buros Institute of Mental Measurements.
Google Scholar
Zedeck, S., & Baker, H. T. (1972). Nursing performance as measured by behavioral expectation sales: A multitrait-multirater analysis. Organizational Behavior and Human Performance, 7, 457–466.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Human Resources Research Organization, 10503 Timberwood Circle, Suite 101, Louisville, KY, 40223, USA
Bethany H. Bynum
Department of Psychology, The University of Georgia, Athens, GA, 30602-3013, USA
Brian J. Hoffman
Department of Psychology, North Carolina State University, Raleigh, NC, 27695-7650, USA
Adam W. Meade
Center for Creative Leadership, One Leadership Place, Greensboro, NC, 27438, USA
William A. Gentry

Authors

Bethany H. Bynum
View author publications
You can also search for this author in PubMed Google Scholar
Brian J. Hoffman
View author publications
You can also search for this author in PubMed Google Scholar
Adam W. Meade
View author publications
You can also search for this author in PubMed Google Scholar
William A. Gentry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bethany H. Bynum.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bynum, B.H., Hoffman, B.J., Meade, A.W. et al. Reconsidering the Equivalence of Multisource Performance Ratings: Evidence for the Importance and Meaning of Rater Factors. J Bus Psychol 28, 203–219 (2013). https://doi.org/10.1007/s10869-012-9272-7

Download citation

Published: 31 July 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10869-012-9272-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reconsidering the Equivalence of Multisource Performance Ratings: Evidence for the Importance and Meaning of Rater Factors