Abstract
Purpose
The study specified an alternate model to examine the measurement invariance of multisource performance ratings (MSPRs) to systematically investigate the theoretical meaning of common method variance in the form of rater effects. As opposed to testing invariance based on a multigroup design with raters aggregated within sources, this study specified both performance dimension and idiosyncratic rater factors.
Design/Methodology/Approach
Data was obtained from 5,278 managers from a wide range of organizations and hierarchical levels, who were rated on the BENCHMARKS® MSPR instrument.
Findings
Our results diverged from prior research such that MSPRs were found to lack invariance for raters from different levels. However, same level raters provided equivalent ratings in terms of both the performance dimension loadings and rater factor loadings.
Implications
The results illustrate the importance of modeling rater factors when investigating invariance and suggest that rater factors reflect substantively meaningful variance, not bias.
Originality/Value
The current study applies an alternative model to examine invariance of MSPRs that allowed us to answer three questions that would not be possible with more traditional multigroup designs. First, the model allowed us to examine the impact of paramaterizing idiosyncratic rater factors on inferences of cross-rater invariance. Next, including multiple raters from each organizational level in the MSPR model allowed us to tease apart the degree of invariance in raters from the same source, relative to raters from different sources. Finally, our study allowed for inferences with respect to the invariance of idiosyncratic rater factors.
Similar content being viewed by others
References
Becker, T. E., & Cote, J. A. (1994). Additive and multiplicative method effects in applied psychological research: An empirical assessment of three models. Journal of Management, 20, 625–641.
Blair, C. A., Hoffman, B. J., & Helland, K. R. (2008). Narcissism in organizations: A multisource appraisal reflects different perspectives. Human Performance, 21, 254–276.
Borman, W. C. (1974). The rating of individuals in organizations: An alternate approach. Organizational Behavior and Human Performance, 12, 105–124.
Borman, W. C. (1997). 360 ratings: An analysis of assumptions and research agenda for evaluating their validity. Human Resource Management Review, 7, 299–315.
Borman, W. C., & Brush, D. H. (1993). More progress toward a taxonomy of managerial performance requirements. Human Performance, 6, 1–21.
Burns, G. L., Walsh, J. A., & Gomez, R. (2003). Convergent and discriminant validity of trait and source effects in ADHD-inattention and hyperactivity/impulsivity measures across a 3-month interval. Journal of Abnormal Child Psychology, 31, 529–541.
Campbell, J. P., McHenry, J. J., & Wise, L. L. (1990). Modeling job performance in a population of jobs. Personnel Psychology, 43, 313–333.
Carty, H. M. (2003). Review of BENCHMARKS® [revised]. In B. S. Plake, J. Impara, & R. A. Spies (Eds.), The fifteenth mental measurements yearbook (pp. 123–124). Lincoln, NE: Buros Institute of Mental Measurements.
CCL. (2004). BENCHMARKS ® facilitator’s manual. Greensboro, NC: Center for Creative Leadership.
Cheung, G. W. (1999). Multifaceted conceptions of self-other ratings disagreement. Personnel Psychology, 52, 1–36.
Church, A. H., & Allen, D. W. (1997). Advancing the state of the art of 360-degree feedback. Group & Organization Management, 22, 149–161.
Conway, J. M. (1996). Analysis and design of multitrait-multirater performance appraisal studies. Journal of Management, 22, 139–162.
Conway, J. M. (1999). Distinguishing contextual performance from task performance for managerial jobs. Journal of Applied Psychology, 84, 3–13.
Conway, J. M., & Huffcutt, A. I. (1997). Psychometric properties of multisource performance ratings: A meta-analysis of subordinate, supervisor, peer, and self- ratings. Human Performance, 10, 331–360.
Conway, J. M., Lievens, F., Scullen, S. E., & Lance, C. E. (2004). Bias in the correlated uniqueness model for MTMM data. Structural Equation Modeling, 11, 535–559.
Conway, J. M., Lombardo, K., & Sanders, K. C. (2001). A meta-analysis of incremental validity and nomological networks for subordinate and peer ratings. Human Performance, 14, 267–303.
Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90, 218–244.
Coovert, M. D., Craiger, J. P., & Teachout, M. S. (1997). Effectiveness of the direct product versus confirmatory factor model for reflecting the structure of multimethod-multirater job performance data. Journal of Applied Psychology, 82, 271–280.
Dansereau, F., Graen, G., & Haga, W. (1975). A vertical dyad linkage approach to leadership within formal organizations: A longitudinal investigation of the role making process. Organizational Behavior and Human Performance, 13, 46–78.
Diefendorff, J. M., Silverman, S. B., & Greguras, G. J. (2005). Measurement equivalence and multisource ratings for non-managerial positions: Recommendations for research and practice. Journal of Business and Psychology, 19, 399–425.
Doty, D. H., & Glick, W. H. (1998). Common method bias: Does common methods variance really bias results? Organizational Research Methods, 1, 374–406.
Facteau, J. D., & Craig, S. B. (2001). Are performance appraisal ratings from different rating sources comparable? Journal of Applied Psychology, 86, 215–227.
Fleenor, J. W., McCauley, C. D., & Brutus, S. (1996). Self-other rating agreement and leader effectiveness. Leadership Quarterly, 7, 487–506.
Gorman, C. A., & Rentsch, J. R. (2009). Evaluating frame-of-reference rater training effectiveness using performance schema accuracy. Journal of Applied Psychology, 94, 1336–1344.
Hannum, K. M. (2007). Measurement equivalence of 360°-assessment data: Are different raters rating the same construct? International Journal of Selection and Assessment, 15, 293–301.
Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self-supervisor, self-peer, and peer-supervisor ratings. Personnel Psychology, 41, 43–62.
Hoffman, B. J., Gorman, A., Atchley, E. K., Blair, C., Meriac, J., & Overstreet, B. (in press).Evidence for the effectiveness of an alternate multi-source feedback measurement methodology. Personnel Psychology.
Hoffman, B. J., Lance, C. E., Bynum, B. H., & Gentry, W. A. (2010). Rater source effects are alive and well after all. Personnel Psychology, 63, 119–151.
Hoffman, B. J., & Woehr, D. J. (2009). Disentangling the meaning of multisource feedback: Expanding the nomological network surrounding source and dimension factors. Personnel Psychology, 62, 735–765.
Holzbach, R. L. (1978). Rater bias in performance ratings: Superior, self-, and peer ratings. Journal of Applied Psychology, 63, 579–588.
Hooijberg, R., & Choi, J. (2000). Which leadership roles matter to whom? An examination of ratter effects on perceptions of effectiveness. Leadership Quarterly, 11, 341–364.
James, L. R. (1988). Organizational climate: Another look at a potentially important construct. In S. G. Cole & R. G. Demaree (Eds.), Applications of interactionist psychology: Essays in honor of Saul B. Sells (pp. 253–282). Hillsdale, NJ: Lawrence Erlbaum.
Jöreskog, K., & Sörbom, D. (2003). LISREL 8.54 [Coputer Software]. Chicago: Scientific Software International Inc.
Kenny, D. A., & Berman, J. S. (1980). Statistical approaches to the correction of correlational bias. Psychological Bulletin, 88, 288–295.
Kenny, D. A., & Zautra, A. (2001). The trait-state models for longitudinal data. In L. M. Collins & A. G. Sayer (Eds.), New methods for the analysis of change (pp. 243–263). Washington, DC: American Psychological Association.
King, L. M., Hunter, J. E., & Schmidt, F. L. (1980). Halo in a multidimensional forced-choice performance evaluation scale. Journal of Applied Psychology, 65, 507–516.
Klimoski, R. J., & London, M. (1974). Role of the rater in performance appraisal. Journal of Applied Psychology, 59, 445–451.
Lance, C. E. (2008). Why assessment centers do not work the way they are supposed to. Industrial and Organizational Psychology: Perspective on Science and Practice, 1, 84–97.
Lance, C. E., Baranik, L. E., Lau, A. R., & Scharlau, E. A. (2008a). If it ain’t trait it must be method: (Mis)application of the multitrait-multimethod methodology in organizational research. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and methodological myths and urban legends: Received doctrine, verity, and fable in organizational and social research (pp. 337–360). Mahwah, NJ: Erlbaum.
Lance, C. E., Baxter, D., & Mahan, R. P. (2006). Multi-source performance measurement: A reconceptualization. In W. Bennett, C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future challenges (pp. 49–76). Mahwah, NJ: Erlbaum.
Lance, C. E., Dawson, B., Birkelbach, D., & Hoffman, B. J. (2010). Method effects, measurement error, and substantive conclusions. Organizational Research Methods, 13, 435–455.
Lance, C. E., Hoffman, B. J., Gentry, W. A., & Baranik, L. E. (2008b). Rater source factors represent important subcomponents of the criterion construct space, not rater bias. Human Resource Management Review, 18, 223–232.
Lance, C. E., Noble, C. L., & Scullen, S. E. (2002). A critique of the correlated trait-correlated method (CTCM) and correlated uniqueness (CU) models for multitrait-multimethod (MTMM) data. Psychological Methods, 7, 228–244.
Lance, C. E., Teachout, M. S., & Donnelly, T. M. (1992). Specification of the criterion construct space: An application of hierarchical confirmatory factor analysis. Journal of Applied Psychology, 77, 437–452.
Lombardo, M. M., & McCauley, C. D. (1994). BENCHMARKS ® : A manual and trainer’s guide. Greensboro, NC: Center for Creative Leadership.
Lombardo, M. M., McCauley, C. D., McDonald-Mann, D., & Leslie, J. B. (1999). BENCHMARKS ® developmental reference points. Greensboro, NC: Center for Creative Leadership.
Lord, R. G., Foti, R. J., & De Vader, C. L. (1984). A test of leadership categorization theory: Internal structure, information processing, and leadership perceptions. Organizational Behavior and Human Performance, 34, 343–378.
Mabe, P., & West, J. (1982). Validity of self-evaluation of ability: A review and meta-analysis. Journal of Applied Psychology, 67, 280–296.
Maurer, T. J., Raju, N. S., & Collins, W. C. (1998). Peer and subordinate performance appraisal measurement equivalence. Journal of Applied Psychology, 83, 693–702.
McCauley, C., & Lombardo, M. (1990). BENCHMARKS®: An instrument for diagnosing managerial strengths and weaknesses. In K. E. Clark & M. B. Clark (Eds.), Measures of leadership (pp. 535–545). West Orange, NJ: Leadership Library of America.
McCauley, C., Lombardo, M., & Usher, C. (1989). Diagnosing management development needs: An instrument based on how managers develop. Journal of Management, 15, 389–403.
Mount, M. K., Judge, T. A., Scullen, S. E., Sytsma, M. R., & Hezlett, S. A. (1998). Trait, rater, and level effects in 360-degree performance ratings. Personnel Psychology, 51, 557–576.
Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology: Perspectives on Science and Practices, 1, 148–160.
Nathan, B. R., & Tippins, N. (1990). The consequences of halo ‘error’ in performance ratings: A field study of the moderating effect of halo on test validation results. Journal of Applied Psychology, 75, 290–296.
Newcomb, T. (1931). An experiment designed to test the validity of a rating technique. Journal of Educational Psychology, 22, 279–289.
Oh, I. S., & Berry, C. M. (2009). The five-factor model of personality and managerial performance: Validity gains through the use of 360 degree performance ratings. Journal of Applied Psychology, 94, 1498–1513.
Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903.
Putka, D. J., Lance, C. E., Le, H., & McCloy, R. A. (2011). A cautionary note on modeling disaggregated multitrait-multirater data: Selection of raters can really matter! Organizational Research Methods, 14, 503–529.
Rigdon, E. E. (1995). A necessary and sufficient identification rule for structural models. Estimated in practice. Multivariate Behavioral Research, 30, 359–383.
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88, 413–428.
Schermelleh-Engel, K., Keith, N., Moosbrugger, H., & Hodapp, V. (2004). Decomposing person and occasion-specific effects: An extension of Latent State-Trait (LSI) theory to hierarchical LST models. Psychological Methods, 9, 198–219.
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85, 956–970.
Smith, G., & Smith, J. (2005). Regression to the mean in average test scores. Educational Assessment, 10, 377–399.
Spangler, M. (2003). Review of BENCHMARKS® [revised]. In B. S. Plake, J. Impara, & R. A. Spies (Eds.), The fifteenth mental measurements yearbook (pp. 124–126). Lincoln, NE: Buros Institute of Mental Measurements.
Sparrowe, R. T., & Liden, R. C. (1997). Process and structure in leader–member exchange. Academy of Management Review, 22, 522–552.
Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4, 25–29.
Vance, R. J., MacCallum, R. C., Coovert, M. D., & Hedge, J. W. (1988). Construct validity of multiple job performance measures using confirmatory factor analysis. Journal of Applied Psychology, 73, 74–80.
Vandenberg, R. J. (2002). Toward a further understand of and improvement in measurement invariance methods and procedures. Organizational Research Methods, 5, 139–158.
Vandenberg, R. J., & Lance, C. E. (2000). Review and synthesis of the measurement invariance literature: Suggestions, practices and recommendations for organizational research. Organizational Research Methods, 3, 4–70.
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2002). The moderating influence of job performance dimension on convergence of supervisory and peer ratings of job performance: Unconfounding construct-level convergence and rating difficulty. Journal of Applied Psychology, 87, 245–354.
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general factor in ratings of job performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90, 108–131.
Wells, F. L. (1907). A statistical study of literary merit. (Columbia Univ. Cont. to Phil. & Psych., 16, 3.). Archives of Psychology, 7.
Wherry, R. J., & Bartlett, C. J. (1982). The control of bias in ratings: A theory of rating. Personnel Psychology, 35, 521–551.
Woehr, D. J., Sheehan, M. K., & Bennett, W. (2005). Assessing measurement equivalence across ratings sources: A multitrait-multirater approach. Journal of Applied Psychology, 90, 592–600.
Zedeck, S. (1995). Review of BENCHMARKS®. In J. Conoley & J. Impara (Eds.), The twelfth mental measurements yearbook (Vol. 1, pp. 128–129). Lincoln, NE: Buros Institute of Mental Measurements.
Zedeck, S., & Baker, H. T. (1972). Nursing performance as measured by behavioral expectation sales: A multitrait-multirater analysis. Organizational Behavior and Human Performance, 7, 457–466.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bynum, B.H., Hoffman, B.J., Meade, A.W. et al. Reconsidering the Equivalence of Multisource Performance Ratings: Evidence for the Importance and Meaning of Rater Factors. J Bus Psychol 28, 203–219 (2013). https://doi.org/10.1007/s10869-012-9272-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10869-012-9272-7