Skip to main content
Log in

Three Criteria for Consensus Conferences

  • Published:
Foundations of Science Aims and scope Submit manuscript

Abstract

Consensus conferences are social techniques which involve bringing together a group of scientific experts, and sometimes also non-experts, in order to increase the public role in science and related policy, to amalgamate diverse and often contradictory evidence for a hypothesis of interest, and to achieve scientific consensus or at least the appearance of consensus among scientists. For consensus conferences that set out to amalgamate evidence, I propose three desiderata: Inclusivity (the consideration of all available evidence), Constraint (the achievement of some agreement of intersubjective assessments of the hypothesis of interest), and Evidential Complexity (the evaluation of available evidence based on a plurality of relevant evidential criteria). Two examples suggest that consensus conferences can readily satisfy Inclusivity and Evidential Complexity, but consensus conferences do not as easily satisfy Constraint. I end by discussing the relation between social inclusivity and the three desiderata.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Robustness-style arguments have been frequently appealed to as grounds for objectivity; concordant multimodal evidence has been seen as a way to avoid worries about the fallibility of single modes of evidence and as a way to resist skeptical arguments. See discussions of robustness (or synonyms) in Wimsatt (1981), Cartwright (1983), Salmon (1984), Culp (1994), Chang (2004), Weber (2005), Kosso (2006), Stegenga (2009), Kuorikoski et al. (2010), and Stegenga (2011a).

  2. The process of bringing together experts in an attempt to resolve disagreement and settle on a fact of the matter is probably as old as organized humanity. One of the more infamous examples of a consensus conference is the 1616 meeting of the commission of theologians, or Qualifiers, who came to a formal consensus that the hypothesis of a moving earth is “foolish and absurd in philosophy” (see Westman 2011).

  3. However, some advocates of deliberative approaches to amalgamating evidence have been critical of formal methods of evidence amalgamation. A long-time critic of formal amalgamation methods such as meta-analysis has suggested that personal judgment is necessary to properly amalgamate evidence:

    A good review is based on intimate personal knowledge of the field, the participants, the problems that arise, the reputation of different laboratories, the likely trustworthiness of individual scientists, and other partly subjective but extremely relevant considerations. Meta-analysis rules out any such subjective factors. (Eysenck 1994)

    For a critical account of meta-analysis, see Stegenga (2011b).

  4. As of 2007 the NIH had produced 118 consensus statements (Solomon 2007).

  5. Not much, I think, should be placed on this distinction. Often the policy implications of an epistemic conclusion are clear to the participants of a consensus conference, it is usually policy makers who organize consensus conferences, and policies themselves involve predictions on some epistemic basis or other. Thus, like the Danish model, the U.S. model is often employed for guidance with policy formulation, albeit perhaps less directly.

  6. This problem does not arise in contexts in which there is an independent indicator of the truth. An anonymous reviewer suggests that in situations in which evidence is amalgamated in order to make predictions, we have such an independent indicator of the truth, since the AM can be tested against the frequency with which its predictions are borne out. However, often in the contexts in which AMs are used, the track record of an AM can only be evaluated by appeal to further evidence relevant to the hypothesis in question, and when such new evidence is itself inconclusive (which is ubuitous in such contexts) then the above circular argument applies. For instance, suppose our hypothesis (H) is “drug \(x\) alleviates symptoms \(y\)”, and we use an AM to amalgamate the available evidence regarding H, and then come to affirm H as probable. Further suppose that H warrants a prediction that if \(x\) were to be used in clinical practice it would alleviate \(y\). We then use \(x \)in clinical practice with the hope that it alleviates \(y\). But the evidence regarding H that becomes available from the use of \(x\) in clinical practice is only one kind of evidence relevant to H, and indeed such evidence is, in some widely recognized respects, inferior to the initial evidence that was amalgamated by the AM in the first place (because, for example, evidence from clinical practice is not controlled, and is liable to confounding by expectation bias and confirmation bias). So even after the prediction is made based on H and evidence is gathered about the prediction, our epistemic state regarding H is not different in kind than it was prior to the prediction, and the inferior evidence gathered after the prediction cannot be an arbiter of the veracity of the AM.

  7. Even this, though, is overly optimistic: elsewhere I argue that even when assessing a single mode of evidence, constraint is not necessarily achieved, because there are numerous features of evidence that must be assessed, which can be variably (but rationally) prioritized.

  8. A caveat: much hinges on what evidence is deemed ‘relevant’, and this is often a matter of dispute.

  9. Criticisms of this include Worrall (2002), Cartwright (2007) and Worrall (2007). See also my discussion of the relation between social inclusivity and (I) in §4 below.

  10. If a method provides information that is no more reliable than a randomizer, then such information should not be considered ‘evidence’. If two methods are both somewhat reliable but their degrees of reliabilities differ, then evidence from such methods should be weighted accordingly by an AM. Elsewhere I investigate such weighting methodologies for evidence in clinical research.

  11. For examples of the plurality of features of evidence that scientists consider, see, for example, Franklin (2002).

  12. (E) is a kind of epistemic inclusiveness at the level of the plurality of features of evidence, rather than at the level of the plurality of kinds of evidence available (which is accounted for by (I)).

  13. For instance, one way to substantiate (E) would be to consider the extensive philosophical literature on experimentation, which among many works include Hacking (1983), Franklin (2002), and Weber (2005)

  14. This desideratum, a kind of social inclusiveness, is distinct from my (I) above, meant to be inclusiveness of an epistemic kind only. Nevertheless, as Longino (1990) and others have argued, one way to help achieve the epistemic virtues that I am concerned with might be to guarantee social inclusiveness in the process of consensus formation. I return to the relation between social inclusivity and my three desiderata for AMs in §4.

  15. Some argue that knowledge is what an ideal epistemic community would, in the long run, eventually agree on (for instance, this is one interpretation of Peirce’s notion of convergence to the truth). Others argue that knowledge is just what an actual epistemic community settles on (see, for example, Kusch 2002), and so if intersubjective assessment of hypotheses were tightly constrained, then knowledge would be achieved. Though I will not argue the point here, since many others have done so, the conflation between consensus and knowledge should be rejected. See also Miller (2013).

  16. There is a growing body of literature concerned with the epistemic value of consensus, of which Miller (2013) is a recent valuable addition. Since the primary focus of the present paper is on consensus conferences rather than on consensus per se, I avoid an exposition of this literature, but for a sampling, see also Gilbert (1987), Tuomela (1992), Wray (2001), and Tucker (2003).

  17. However, for a critique of the assumed epistemic value of consilience, see Stegenga (2009). Consilience is often called ‘robustness’ (see also footnote 1).

  18. The consensus achieved by the Intergovernmental Panel on Climate Change could be described as an example of achieved constraint despite discordant evidence.

  19. Though Solomon (2007) notes that consensus conferences have been assessed based on their freedom from bias, by the Rand Corporation in 1983, a group at University of Michigan in 1987, and the NIH in 1999.

  20. However, there were some experiments performed on prisoner ‘volunteers’ in the 1960s, with mixed results.

  21. Mathematical models have been employed to show that groups of diverse problem solvers outperform groups of high-ability problem solvers; see, for example, Hong and Page (2004). For a sociological study of the value of social inclusiveness, see Collins and Evans (2002).

  22. This view has been heavily criticized. See footnote 9 for references.

  23. Dissenting outsiders are often non-scientists, and so are not ‘insiders’ to any scientific community. But sometimes such outsiders can be respected scientists in one community and be vocal dissenters to a consensus established by another community. The HIV-AIDS deniers (those who deny that HIV causes AIDS) are a salient example. Some are fully outsiders (Thabo Mbeki, the former president of South Africa is one of the most prominent examples). But others include Peter Duesberg, a molecular biologist at University of California, Berkeley, and Kary Mullis, winner of a Nobel Prize in chemistry. This raises the question: among professional scientists, what constitutes membership in this or that community? A cynical Kuhn-inspired answer might be: assent to the hypothesis about which consensus is in question. A less cynical answer might be: performing active research on that hypothesis. Duesberg and Mullis were outsiders to AIDS research on either answer. See van Rijn (2006).

  24. Solomon (2007) gives several examples of such consensus conferences, including a 1994 conference titled “Helicobacter Pylori in Peptic Ulcer Disease” and a 2002 conference titled “Management of Hepatitis C”. Both of these conferences appeared to achieve (C), but in fact the conferences took place some time after the relevant scientific communities had already achieved consensus. For a criticism of Solomon’s argument, see Kosolosky (2012).

  25. I do not mean to suggest that it is simple to avoid the subtle biases that arise in group deliberative processes. Janis (1982) argued that groups are liable to come to incorrect conclusions in certain circumstances; peer pressure and authoritative pressure stifles dissent and quiets the discussion of discordant evidence. In contrast, Tollefsen (2006) argues that scientists can engage in collaborative deliberation without engaging in groupthink or stifling dissent.

References

  • Beatty, J. (2006). Masking disagreement among scientific experts. Episteme, 3, 52–67.

    Article  Google Scholar 

  • Cartwright, N. (2006). Well-ordered science: Evidence for use. Philosophy of Science, 73, 981–990.

    Article  Google Scholar 

  • Cartwright, N. (2007). Are RCTs the gold standard? Biosocieties, 2, 11–20.

    Article  Google Scholar 

  • Cartwright, N. (1983). How the laws of physics lie. Oxford: Clarendon Press.

    Book  Google Scholar 

  • Chang, H. (2004). Inventing temperature. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Collins, H. M., & Evans, Robert. (2002). The third wave of science studies: Studies of expertise and experience. Social Studies of Science, 32(2), 235–296.

    Article  Google Scholar 

  • Culp, Sylvia. (1994). Defending robustness: The bacterial mesosome as a test case. PSA, 1, 46–57.

    Google Scholar 

  • Douglas, H. (2005). Inserting the public into science. In S. Maasen, & P. Weingart (Eds.), Democratization of expertise? Exploring novel forms of scientific advice in political decision-making. Netherlands: Springer.

  • Eysenck, H. (1994). Systematic reviews: Meta-analysis and its problems. British Medical Journal, 309, 789–792.

    Article  Google Scholar 

  • Franklin, A. (2002). Selectivity and discord: Two problems of experiment. Pittsburgh: Pittsburgh University Press.

    Google Scholar 

  • Gilbert, M. (1987). Modeling collective belief. Synthese, 73(1), 185–204.

    Article  Google Scholar 

  • Hong, L., & Page, S. (2004). Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences, 101(46), 16385–16389.

    Article  Google Scholar 

  • Janis, I. (1982). Groupthink: Psychological studies of policy decisions and fiascoes. Houghton Mifflin.

  • Joss, S., & Durant, J. (Eds.). (1995). Public participation in science: The role of consensus conferences in Europe. UK: Science Museum.

  • Klein R., & Williams, A. (2000). Setting priorities: what is holding us back—inadequate information or inadequate institutions? In A. Coulter, & C. Ham (Eds.), The global challenge of health care rationing. Buckingham: Open University Press.

  • Kosolosky, L. (2012). The Intended window of epistemic opportunity: A comment on Miriam Solomon. In B. Van Kerkhove, T. Libert, G. Vanpaemel, & P. Marage, (Eds.), Logic, philosophy and history of science in Belgium II. Koninklijke Vlaamse Academie van België.

  • Kosso, P. (2006). Detecting extrasolar planets. Studies in History and Philosophy of Science, 37, 224–236.

    Article  Google Scholar 

  • Kramer, P. (2011). In defense of antidepressants. New York Times July 9.

  • Kuorikoski, J., Lehtinen, A., & Marchionni, C. (2010). Economic modeling as robustness analysis. The British Journal for the Philosophy of Science, 61, 541–567.

    Article  Google Scholar 

  • Kusch, M. (2002). Knowledge by agreement: the programme of communitarian epistemology. Clarendon.

  • Lomas, J., Fulop, N., Gagnon, D., & Allen, P. (2003). On being a good listener: Setting priorities for applied health services research. Milbank Quarterly, 81(3), 363–388.

    Article  Google Scholar 

  • Marmot, M. (2004). The status syndrome. New York: Times Books.

    Google Scholar 

  • Mill, J. S. (1859). On liberty.

  • Miller, B. (2013) When is consensus knowledge-based? Distinguishing shared knowledge from mere agreement . Synthese, 190, 1293–1316.

  • Salmon, W. (1984). Scientific explanation and the causal structure of the world. Princeton: Princeton University Press.

    Google Scholar 

  • Solomon, M. (2007). The social epistemology of NIH consensus conferences. In H. Kincaid, & J. McKitrick (Eds.), Establishing medical reality: Essays in the metaphysics and epistemology of biomedical science. Springer.

  • Solomon, M. (2006). Groupthink versus the Wisdom of Crowds: The social epistemology of deliberation and dissent. The Southern Journal of Philosophy, 44, 28–42.

    Article  Google Scholar 

  • Stegenga, J. (2009). Robustness discordance, and relevance. Philosophy of Science, 76, 650–661.

    Article  Google Scholar 

  • Stegenga, J. (2011a). An impossibility theorem for amalgamating evidence. Synthese, 190, 2391–2411.

    Article  Google Scholar 

  • Stegenga, J. (2011b). Is meta-analysis the platinum standard? Studies in History and Philosophy of Biological and Biomedical Sciences, 42(4), 497–507.

    Article  Google Scholar 

  • Tollefsen, D. P. (2006). Group deliberation, social cohesion, and scientific teamwork: Is there room for dissent? Episteme, 3, 37–51.

    Article  Google Scholar 

  • Tucker, A. (2003). The epistemic significance of consensus. Inquiry, 46(4), 501–521.

    Article  Google Scholar 

  • Tuomela, R. (1992). Group beliefs. Synthese, 91(3), 285–318.

    Article  Google Scholar 

  • van Rijn, K. (2006). The politics of uncertainty: The AIDS debate, Thabo Mbeki and the South African government response. Social History of Medicine, 19(3), 521–538.

    Article  Google Scholar 

  • Weber, M. (2005). Philosophy of experimental biology. Cambridge: Cambridge University Press.

    Google Scholar 

  • Westman, R. (2011). The copernican question: Prognostication, skepticism, and celestial order. Berkeley: University of California Press.

    Google Scholar 

  • Wimsatt, W. (1981). Robustness, reliability, and overdetermination. In M. B. Brewer, & B. E. Collins (Eds.), Scientific inquiry and the social sciences. Jossey-Bass.

  • Worrall, J. (2007). Why there’s no cause to randomize. British Journal for the Philosophy of Science, 58, 451–588.

    Article  Google Scholar 

  • Worrall, J. (2002). What evidence in evidence-based medicine? Philosophy of Science, 69, S316–S330.

    Article  Google Scholar 

  • Wray, K. B. (2001). Collective belief and acceptance. Synthese, 129(3), 319–333.

    Article  Google Scholar 

Download references

Acknowledgments

I am grateful to Nancy Cartwright, Boaz Miller, Alex Broadbent, Laszlo Kosolosky, Miriam Solomon, Anton Froeyman, Jeroen Van Bouwel, Heather Douglas, and two anonymous reviewers for detailed feedback on versions of this paper. Financial support was provided by the Banting Postdoctoral Fellowships Program administered by the Social Sciences and Humanities Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacob Stegenga.

Additional information

Forthcoming in Foundations of Science.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stegenga, J. Three Criteria for Consensus Conferences. Found Sci 21, 35–49 (2016). https://doi.org/10.1007/s10699-014-9374-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10699-014-9374-y

Keywords

Navigation