Neglected considerations in the analysis of agreement among journal referees

Hargens, L. L.; Herting, J. R.

doi:10.1007/BF02130467

Neglected considerations in the analysis of agreement among journal referees

Published: July 1990

Volume 19, pages 91–106, (1990)
Cite this article

Scientometrics Aims and scope Submit manuscript

L. L. Hargens¹ &
J. R. Herting²

119 Accesses
67 Citations
Explore all metrics

Abstract

Studies of representative samples of submissions to scientific journals show statistically significant associations between referees' recommendations. These associations are moderately large given the multidimensional and unstable character of scientists' evaluations of papers, and composites of referees' recommendations can significantly aid editors in selecting manuscripts for publication, especially when there is great variability in the quality of submissions and acceptance rates are low. Assessments of the value of peer-review procedures in journal manuscript evaluation should take into account features of the entire scholarly communications system present in a field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interrater Reliability of the Peer Review Process in Management Journals

The Journal Editor as Academic Custodian

Peer reviewers equally critique theory, method, and writing, with limited effect on the final content of accepted manuscripts

Article Open access 09 April 2022

Dimity Stephen

Notes and references

V. Bakanic, C. McPhail, R. J. Simon, The manuscript review and decision-making process,American Sociological Review 52 (1987) 631.
Google Scholar
G. J. Whitehurst, Interrater agreement for journal manuscript reviews,American Psychologist, 39 (1984) 22.
Google Scholar
S. Lock,A Delicate Balance: Editorial Peer Review in Medicine, Philadelphia, Penn., ISI Press, 1986.
Google Scholar
E. Garfield, Refereeing and peer review. Part 1. Opinion and Conjecture on the effectiveness of refereeing,Current Contents, (1986) No. 31,3.
E. Garfield, Refereeing and peer review. Part 2. The research on refereeing and alternatives to the present system,Current Contents, (1986) No. 32,3.
D. P. Peters, S. J. Ceci, Peer review practices of psychological journals: The fate of published articles, submitted again,behavioural and Brain Sciences, 5 (1982) 187. Also see the commentary on this paper in the same issue ofBehavioural and Brain Sciences.
Google Scholar
P. McReynolds, Reliability of ratings of research papers,American Psychologist, 26 (1971) 400.
Google Scholar
C. M. Bowen, R. Perloff, J. Jacoby, Improving manuscript evaluation procedures,American Psychologist, 27 (1972) 22.
Google Scholar
S. Cole, J. R. Cole, G. Simon, Chance and consensus in per review,Science 214 (1981) 881.
Google Scholar
D. Klahr, Insiders, outsiders and efficiency in a National Science Foundation panel,American Psychologist, 40 (1985) 148.
Google Scholar
Of course, referees' evaluations may show agreement because they reflect other variables beside merit; “reliability is not the same as validity.” To our knowledge, researchers have reported only two studies of the association between scientists' evaluations of papers and independent indicators of those papers' merit.Small reported data on original referees' assessments of a sample of highly cited papers in chemistry, which showed a nonsignificant correlation between the referee assessments and the subsequent citation levels (seeH. G. Small,Characteristics of Frequently Cited Papers in Chemistry. Final Report on NSF Contract NSF-C795, Philadelphia, 1974). In contrast,Gottfredson found more substantial positive correlations between citations to published psychology papers and overall judgments of those papers' quality and impact made by experts nominated by the papers' authors (seeS. D. Gottfredson, Evaluating psychological research reports: Dimensions, reliability, and correlates of quality judgments,American Psychologist, 33 (1978) 920).
Google Scholar
M. J. Mahoney,Scientist as Subject: The Psychological Imperative, Cambridge Mass, Ballinger, 1976.
Google Scholar
D. Lindsey,The Scientific Publication System in Social Science, San Fransisco, Jossey-Bass, 1978.
Google Scholar
L. L. Hargens, Scholarly consensus and journal rejection rates,American Sociological Review, 53 (1988) 139.
Google Scholar
D. Lindsey, Assessing precision in the manuscript review process: A little better than chance,Scientometrics, 14 (1988) 75.
Google Scholar
See, for example,H. M. Blalock, Jr.,Social Statistics, New York, McGraw-Hill, 1979.
Google Scholar
A. W. Ward, B. W. Hall, C. F. Schram, Evaluation of published educational research, a national study,American Educational Research Journal, 12 (1975) 109.
Google Scholar
Unrepresentatively homogeneous samples of papers are also produced when editors summarily reject a large proportion of submissions. To the extent that editors screen out manuscripts that referees would judge to be of poor quality, studies based on the remaining papers that receive referee evaluations will tend to show low levels of agreement between referees. High-prestige multidisciplinary journals, high-prestige medical journals, and social science journals are most likely to exhibit high summary rejection rates. SeeM. D. Gordon,A Study of the Evaluation of Research Papers by Primary Journals in the U.K., Leicester, England: Primary Communications Research Center, University of Leicester, 1978.
Google Scholar
W. A. Scott, Interreferee agreement on some characteristics of manuscripts submitted to the Journal of Personality and Social Psychology,American Psychologist, 29 (1974) 698.
Google Scholar
L. L. Hargens, J. R. Herting, A new approach to referees' assessments of manuscripts,Social Science Research (forthcoming).
Lindsey op. cit.. reference 17 above, suggests that they are likely to be lower, but seems to base his judgment on results from the numerous studies that have been subject to truncated variation rather than those studies that have been based on more representative samples of manuscripts.
Google Scholar
H. L. Roediger III, The role of journal editors in the scientific process, inD. N. Jackson, J. P. Rushton (Eds)Scientific Excellence: Origins and Assessment, Beverly Hills, CA: Sage, 1987, 222.
Google Scholar
B. C. Griffith, Judging document content versus social functions of refereeing: Possible and impossible tasks,Behavioural and Brain Sciences, 5 (1982) 214.
Google Scholar
T. Saracevic, Relevance: A review of and framework for thinking on the notion in information science,Journal of the American Society for Information Science 26 (1975) 321.
Google Scholar
J. C. Nunnally,Psychometric Theory, New York, McGraw Hill, 1967.
Google Scholar
H. E. A. Tinsley, D. J. Weiss, Interrater reliability and agreement of subjective judgments,Journal of Counselling Psychology, 22 (1975) 358.
Google Scholar
A. L. Stinchcombe, R. Ofshe, On journal editing as a probabilistic process,American Sociologist, 5 (1969) 19.
Google Scholar
Hargens (footnote 12 in op. cit. in reference 16 above) also made this error.
Google Scholar
Op. cit. reference 15, p. 37. Op. cit. reference 17, D. Lindsey, Assessing precision in the manuscript review process: A little better than chance,Scientometrics, 14 (1988) p. 78.
Google Scholar
These results illustrate the point that measures with low reliability, and therefore low validity, can be valuable when selection ratios are low and there is substantial variation among cases being evaluated (seeL. J. Cronbach,Essentials of Psychological Testing (3rd Ed.), New York, Harper and Row, 1970).Lindsey and others have argued that referees' evaluations of manuscripts are more likely to be unreliable for behavioural science journals than for natural science journals, but the former are also more likely to exhibit the two conditions that enhance the practical value of even fairly unreliable evaluations. Highly selective and prestigious medical journals also exhibit these two conditions, andLock, op. cit., estimates that theBritish Medical Journal accepts 80 percent of the top quality papers submitted to it.
Google Scholar
Lindsey's recommendation that journals solicit the opinions of at least three initial referees for each submission also exaggerates the benefits of such a policy by neglecting the peer-review system used by journals. For most behavioural-science journals, which are the focus of Lindsey's discussion, a substantial minority of manuscripts (those receiving a split decision from the two initial referees and even some of those receiving two positive evaluations) already receive three referee evaluations. Using three initial referees for all papers will increase the reliability of the composite evaluations of only those papers that would receive two unfavourable evaluations under the current system. Unfortunately, using three initial referees for these papers would also slow down their evaluation, and authors appear to be more concerned about the speed of the journal review process than about the reliability of referees' evaluations (seeY. Brackbill, F. Korten, ‘Journal reviewing practices: Authors’ and APA members' suggestions for revision,American Psychologist, 27 (1972) 22). Using three initial referees might speed up the evaluation of the remaining manuscripts somewhat (because editors would not wait until the first two referees returned recommendations before soliciting the opinion of the third), but the fact that these constitute a minority of submissions would probably not allow the time savings experienced for them to counterbalance the longer lags experienced in evaluating the very large proportion of manuscripts that receive only two evaluations under the current system.
Google Scholar
SeeBlalock, op. cit.. p. 282–290.
Google Scholar
Lindsey (op. cit. reference 17). reports a non-significant chi-squared value for a “quasiindependence” model applied to data from one of these journals,Personality and Social Psychology Bulletin. Unfortunately,Lindsey does not specify which model of quasi-independence he tested. We have been able to obtain the chi-squared value he reports only by (1) treating the P&PB data in Lindsey's Table 2 as frequencies (they are actually percentages) and (2) constraining the model to reproduce the entries along the diagonal of Lindsey's Table 2 (some of these entries represent disagreement and others represent agreement). Thus, it is doubtful that Lindsey's analysis tested any meaningful hypothesis, much less the null hypothesis that referees' judgments are statistically independent.
Google Scholar
L. A. Goodman, New methods for analyzing the intrinsic character of qualitative variables using cross-classified data,American Journal of Sociology 93 (1987) 529.
Google Scholar
C. C. Clogg, Using association models in sociological research: Some examples,American Journal of Sociology, 88 (1982) 114.
Google Scholar
SeeJ. R. Cole, S. Cole, Which researcher will get the grant?,Nature, 279 (1979) 575–576, andGordon, op. cit. These measures include various estimates of the proportion of the total variance in referees' assessments that is between- or within-manuscript (or proposal) variance.
Google Scholar
SeeS. Cole, G. Simon, J. R. Cole, Do journal rejection rates index consensus?,American Sociological Review, 53 (1988) 152, andL. L. Hargens, Further evidence on field differences in consensus from the NSF peer review studies,American Sociological Review, 53 (1988) 157.
Google Scholar
H. A. Zuckerman, R. K. Merton, Patterns of evaluation in science: institutionalization, structure and functions of the referee system,Minerva, 9 (1971) 66.
Google Scholar
R. E. Stevens,Characteristics of Subject Literatures, ACRL Monograph No.6, Chicago, Association of College and Reference Libraries, 1953.
C. H. Brown,Scientific Serials, ACRL Monograph No.16, Chicago, Association of College and Reference Libraries, 1956.
W. D. Garvey, N. Lin, C. E. Nelson, Some comparisons of communication activities in the physical and social sciences, In:C. E. Nelson, D. K. Pollock (Eds.)Communication among Scientists and Engineers, Lexington, Mass.: Heath, 1970, P. 61.
Google Scholar
Op. cit. reference 16. One reason that studies of referee reliability are relatively rare for physicalscience journals is that such journals often use the single initial referee system. Thus, data on pairs of referee assessments of all submissions are unavailable for these journals. Those manuscripts that do receive at least two independent referee evaluations under this system are an unrepresentative subset of all manuscripts. Thus, nonexperimental data on referee agreement for these journals, such as the evidence reported by Zuckerman and Merton, should be viewed with caution.
Google Scholar
W. D. Garvey,Communication: The Essence of Science, Oxford, Pergammon, 1979.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Sociology, University of Illinois, 702 South Wright Street, 61801, Urbana, IL, (USA)
L. L. Hargens
Department of Sociology Stanford University, 94305, Stanford, CA, (USA)
J. R. Herting

Authors

L. L. Hargens
View author publications
You can also search for this author in PubMed Google Scholar
J. R. Herting
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hargens, L.L., Herting, J.R. Neglected considerations in the analysis of agreement among journal referees. Scientometrics 19, 91–106 (1990). https://doi.org/10.1007/BF02130467

Download citation

Received: 04 September 1989
Issue Date: July 1990
DOI: https://doi.org/10.1007/BF02130467

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neglected considerations in the analysis of agreement among journal referees

Abstract

Access this article

Similar content being viewed by others

Interrater Reliability of the Peer Review Process in Management Journals

The Journal Editor as Academic Custodian

Peer reviewers equally critique theory, method, and writing, with limited effect on the final content of accepted manuscripts

Notes and references

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Neglected considerations in the analysis of agreement among journal referees

Abstract

Access this article

Similar content being viewed by others

Interrater Reliability of the Peer Review Process in Management Journals

The Journal Editor as Academic Custodian

Peer reviewers equally critique theory, method, and writing, with limited effect on the final content of accepted manuscripts

Notes and references

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation