, Volume 19, Issue 1–2, pp 91–106 | Cite as

Neglected considerations in the analysis of agreement among journal referees

  • L. L. Hargens
  • J. R. Herting


Studies of representative samples of submissions to scientific journals show statistically significant associations between referees' recommendations. These associations are moderately large given the multidimensional and unstable character of scientists' evaluations of papers, and composites of referees' recommendations can significantly aid editors in selecting manuscripts for publication, especially when there is great variability in the quality of submissions and acceptance rates are low. Assessments of the value of peer-review procedures in journal manuscript evaluation should take into account features of the entire scholarly communications system present in a field.


Communication System Representative Sample Great Variability Scientific Journal Acceptance Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes and references

  1. 2.
    V. Bakanic, C. McPhail, R. J. Simon, The manuscript review and decision-making process,American Sociological Review 52 (1987) 631.Google Scholar
  2. 3.
    G. J. Whitehurst, Interrater agreement for journal manuscript reviews,American Psychologist, 39 (1984) 22.Google Scholar
  3. 4.
    S. Lock,A Delicate Balance: Editorial Peer Review in Medicine, Philadelphia, Penn., ISI Press, 1986.Google Scholar
  4. 5.
    E. Garfield, Refereeing and peer review. Part 1. Opinion and Conjecture on the effectiveness of refereeing,Current Contents, (1986) No. 31,3.Google Scholar
  5. 6.
    E. Garfield, Refereeing and peer review. Part 2. The research on refereeing and alternatives to the present system,Current Contents, (1986) No. 32,3.Google Scholar
  6. 7.
    D. P. Peters, S. J. Ceci, Peer review practices of psychological journals: The fate of published articles, submitted again,behavioural and Brain Sciences, 5 (1982) 187. Also see the commentary on this paper in the same issue ofBehavioural and Brain Sciences.Google Scholar
  7. 8.
    P. McReynolds, Reliability of ratings of research papers,American Psychologist, 26 (1971) 400.Google Scholar
  8. 9.
    C. M. Bowen, R. Perloff, J. Jacoby, Improving manuscript evaluation procedures,American Psychologist, 27 (1972) 22.Google Scholar
  9. 10.
    S. Cole, J. R. Cole, G. Simon, Chance and consensus in per review,Science 214 (1981) 881.Google Scholar
  10. 11.
    D. Klahr, Insiders, outsiders and efficiency in a National Science Foundation panel,American Psychologist, 40 (1985) 148.Google Scholar
  11. 13.
    Of course, referees' evaluations may show agreement because they reflect other variables beside merit; “reliability is not the same as validity.” To our knowledge, researchers have reported only two studies of the association between scientists' evaluations of papers and independent indicators of those papers' merit.Small reported data on original referees' assessments of a sample of highly cited papers in chemistry, which showed a nonsignificant correlation between the referee assessments and the subsequent citation levels (seeH. G. Small,Characteristics of Frequently Cited Papers in Chemistry. Final Report on NSF Contract NSF-C795, Philadelphia, 1974). In contrast,Gottfredson found more substantial positive correlations between citations to published psychology papers and overall judgments of those papers' quality and impact made by experts nominated by the papers' authors (seeS. D. Gottfredson, Evaluating psychological research reports: Dimensions, reliability, and correlates of quality judgments,American Psychologist, 33 (1978) 920).Google Scholar
  12. 14.
    M. J. Mahoney,Scientist as Subject: The Psychological Imperative, Cambridge Mass, Ballinger, 1976.Google Scholar
  13. 15.
    D. Lindsey,The Scientific Publication System in Social Science, San Fransisco, Jossey-Bass, 1978.Google Scholar
  14. 16.
    L. L. Hargens, Scholarly consensus and journal rejection rates,American Sociological Review, 53 (1988) 139.Google Scholar
  15. 17.
    D. Lindsey, Assessing precision in the manuscript review process: A little better than chance,Scientometrics, 14 (1988) 75.Google Scholar
  16. 18.
    See, for example,H. M. Blalock, Jr.,Social Statistics, New York, McGraw-Hill, 1979.Google Scholar
  17. 19.
    A. W. Ward, B. W. Hall, C. F. Schram, Evaluation of published educational research, a national study,American Educational Research Journal, 12 (1975) 109.Google Scholar
  18. 21.
    Unrepresentatively homogeneous samples of papers are also produced when editors summarily reject a large proportion of submissions. To the extent that editors screen out manuscripts that referees would judge to be of poor quality, studies based on the remaining papers that receive referee evaluations will tend to show low levels of agreement between referees. High-prestige multidisciplinary journals, high-prestige medical journals, and social science journals are most likely to exhibit high summary rejection rates. SeeM. D. Gordon,A Study of the Evaluation of Research Papers by Primary Journals in the U.K., Leicester, England: Primary Communications Research Center, University of Leicester, 1978.Google Scholar
  19. 22.
    W. A. Scott, Interreferee agreement on some characteristics of manuscripts submitted to the Journal of Personality and Social Psychology,American Psychologist, 29 (1974) 698.Google Scholar
  20. 23.
    L. L. Hargens, J. R. Herting, A new approach to referees' assessments of manuscripts,Social Science Research (forthcoming).Google Scholar
  21. 24.
    Lindsey op. cit.. reference 17 above, suggests that they are likely to be lower, but seems to base his judgment on results from the numerous studies that have been subject to truncated variation rather than those studies that have been based on more representative samples of manuscripts.Google Scholar
  22. 25.
    H. L. Roediger III, The role of journal editors in the scientific process, inD. N. Jackson, J. P. Rushton (Eds)Scientific Excellence: Origins and Assessment, Beverly Hills, CA: Sage, 1987, 222.Google Scholar
  23. 26.
    B. C. Griffith, Judging document content versus social functions of refereeing: Possible and impossible tasks,Behavioural and Brain Sciences, 5 (1982) 214.Google Scholar
  24. 27.
    T. Saracevic, Relevance: A review of and framework for thinking on the notion in information science,Journal of the American Society for Information Science 26 (1975) 321.Google Scholar
  25. 28.
    J. C. Nunnally,Psychometric Theory, New York, McGraw Hill, 1967.Google Scholar
  26. 29.
    H. E. A. Tinsley, D. J. Weiss, Interrater reliability and agreement of subjective judgments,Journal of Counselling Psychology, 22 (1975) 358.Google Scholar
  27. 30.
    A. L. Stinchcombe, R. Ofshe, On journal editing as a probabilistic process,American Sociologist, 5 (1969) 19.Google Scholar
  28. 31.
    Hargens (footnote 12 in op. cit. in reference 16 above) also made this error.Google Scholar
  29. 32.
    Op. cit. reference 15, p. 37. Op. cit. reference 17, D. Lindsey, Assessing precision in the manuscript review process: A little better than chance,Scientometrics, 14 (1988) p. 78.Google Scholar
  30. 34.
    These results illustrate the point that measures with low reliability, and therefore low validity, can be valuable when selection ratios are low and there is substantial variation among cases being evaluated (seeL. J. Cronbach,Essentials of Psychological Testing (3rd Ed.), New York, Harper and Row, 1970).Lindsey and others have argued that referees' evaluations of manuscripts are more likely to be unreliable for behavioural science journals than for natural science journals, but the former are also more likely to exhibit the two conditions that enhance the practical value of even fairly unreliable evaluations. Highly selective and prestigious medical journals also exhibit these two conditions, andLock, op. cit., estimates that theBritish Medical Journal accepts 80 percent of the top quality papers submitted to it.Google Scholar
  31. 36.
    Lindsey's recommendation that journals solicit the opinions of at least three initial referees for each submission also exaggerates the benefits of such a policy by neglecting the peer-review system used by journals. For most behavioural-science journals, which are the focus of Lindsey's discussion, a substantial minority of manuscripts (those receiving a split decision from the two initial referees and even some of those receiving two positive evaluations) already receive three referee evaluations. Using three initial referees for all papers will increase the reliability of the composite evaluations of only those papers that would receive two unfavourable evaluations under the current system. Unfortunately, using three initial referees for these papers would also slow down their evaluation, and authors appear to be more concerned about the speed of the journal review process than about the reliability of referees' evaluations (seeY. Brackbill, F. Korten, ‘Journal reviewing practices: Authors’ and APA members' suggestions for revision,American Psychologist, 27 (1972) 22). Using three initial referees might speed up the evaluation of the remaining manuscripts somewhat (because editors would not wait until the first two referees returned recommendations before soliciting the opinion of the third), but the fact that these constitute a minority of submissions would probably not allow the time savings experienced for them to counterbalance the longer lags experienced in evaluating the very large proportion of manuscripts that receive only two evaluations under the current system.Google Scholar
  32. 37.
    SeeBlalock, op. cit.. p. 282–290.Google Scholar
  33. 38.
    Lindsey (op. cit. reference 17). reports a non-significant chi-squared value for a “quasiindependence” model applied to data from one of these journals,Personality and Social Psychology Bulletin. Unfortunately,Lindsey does not specify which model of quasi-independence he tested. We have been able to obtain the chi-squared value he reports only by (1) treating the P&PB data in Lindsey's Table 2 as frequencies (they are actually percentages) and (2) constraining the model to reproduce the entries along the diagonal of Lindsey's Table 2 (some of these entries represent disagreement and others represent agreement). Thus, it is doubtful that Lindsey's analysis tested any meaningful hypothesis, much less the null hypothesis that referees' judgments are statistically independent.Google Scholar
  34. 39.
    L. A. Goodman, New methods for analyzing the intrinsic character of qualitative variables using cross-classified data,American Journal of Sociology 93 (1987) 529.Google Scholar
  35. 40.
    C. C. Clogg, Using association models in sociological research: Some examples,American Journal of Sociology, 88 (1982) 114.Google Scholar
  36. 41.
    SeeJ. R. Cole, S. Cole, Which researcher will get the grant?,Nature, 279 (1979) 575–576, andGordon, op. cit. These measures include various estimates of the proportion of the total variance in referees' assessments that is between- or within-manuscript (or proposal) variance.Google Scholar
  37. 42.
    SeeS. Cole, G. Simon, J. R. Cole, Do journal rejection rates index consensus?,American Sociological Review, 53 (1988) 152, andL. L. Hargens, Further evidence on field differences in consensus from the NSF peer review studies,American Sociological Review, 53 (1988) 157.Google Scholar
  38. 43.
    H. A. Zuckerman, R. K. Merton, Patterns of evaluation in science: institutionalization, structure and functions of the referee system,Minerva, 9 (1971) 66.Google Scholar
  39. 44.
    R. E. Stevens,Characteristics of Subject Literatures, ACRL Monograph No.6, Chicago, Association of College and Reference Libraries, 1953.Google Scholar
  40. 45.
    C. H. Brown,Scientific Serials, ACRL Monograph No.16, Chicago, Association of College and Reference Libraries, 1956.Google Scholar
  41. 46.
    W. D. Garvey, N. Lin, C. E. Nelson, Some comparisons of communication activities in the physical and social sciences, In:C. E. Nelson, D. K. Pollock (Eds.)Communication among Scientists and Engineers, Lexington, Mass.: Heath, 1970, P. 61.Google Scholar
  42. 47.
    Op. cit. reference 16. One reason that studies of referee reliability are relatively rare for physicalscience journals is that such journals often use the single initial referee system. Thus, data on pairs of referee assessments of all submissions are unavailable for these journals. Those manuscripts that do receive at least two independent referee evaluations under this system are an unrepresentative subset of all manuscripts. Thus, nonexperimental data on referee agreement for these journals, such as the evidence reported by Zuckerman and Merton, should be viewed with caution.Google Scholar
  43. 48.
    W. D. Garvey,Communication: The Essence of Science, Oxford, Pergammon, 1979.Google Scholar

Copyright information

© Akadémiai Kiadó 1990

Authors and Affiliations

  • L. L. Hargens
    • 1
  • J. R. Herting
    • 2
  1. 1.Department of SociologyUniversity of IllinoisUrbana(USA)
  2. 2.Department of Sociology Stanford UniversityStanford(USA)

Personalised recommendations