Abstract
The benefits of judgment aggregation are intuitive and well-documented. By combining the input of several judges, practitioners may enhance information sharing and signal strength while cancelling out biases and noise. The resulting judgment is more accurate than the average accuracy of the individual judgments—a phenomenon known as the wisdom of crowds. Although an unweighted arithmetic average is often sufficient to improve judgment accuracy, sophisticated performance-weighting methods have been developed to further improve accuracy. By weighting the judges according to: (1) past performance on similar tasks, (2) performance on closely related tasks, and/or (3) the internal consistency (or coherence) of judgments, practitioners can exploit individual differences in probabilistic judgment skill to ferret out bone fide experts within the crowd. Each method has proven useful, with associated benefits and potential drawbacks. In this chapter, we review the evidence for-and-against performance weighting strategies, discussing the circumstances in which they are appropriate and beneficial to apply. We describe how to implement these methods, with a focus on mathematical functions and formulas that translate performance metrics into aggregation weights.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Change history
26 September 2023
A correction has been published.
Notes
- 1.
As a postscript, we note that these examples, which now appear dated, were incorporated roughly one month prior to Russia’s invasion of Ukraine.
- 2.
By independence we mean that judges operate independently of each other and cannot influence their peers’ judgments. This does not imply that their judgements are uncorrelated. In fact, they can be highly correlate simply because all the judges rely on similar information when formulating their individual opinions (e.g., Broomell & Budescu, 2009).
- 3.
To the extent that the aggregator can predict and correct for bias, aggregators can relax this definition and simply model the true expert based on the elicited judgment.
- 4.
Technically, the median is an (extremely) trimmed mean where one removes the lowest and highest 50% observations, so all trimmed means can be thought of as compromises between the mean and the median.
- 5.
Ties are assigned the average of their collective ranks, i.e., three participants tied for first would receive a rank of \( \frac{1+2+3}{3} \) = 2.
- 6.
References
Afflerbach, P., van Dun, C., Gimpel, H., Parak, D., & Seyfried, J. (2021). A simulation-based approach to understanding the wisdom of crowds phenomenon in aggregating expert judgment. Business & Information Systems Engineering, 63(4), 329–348. https://doi.org/10.1007/s12599-020-00664-x
Armstrong, J. S. (2001). Combining forecasts. In Principles of forecasting: A handbook for researchers and practitioners (1st ed., p. 21). Kluwer Academic Publishers.
Aspinall, W. (2010). A route to more tractable expert advice. Nature, 463(7279), 294–295. https://doi.org/10.1038/463294a
Atanasov, P., Rescober, P., Stone, E., Swift, S. A., Servan-Schreiber, E., Tetlock, P., Ungar, L., & Mellers, B. (2017). Distilling the wisdom of crowds: Prediction markets vs. prediction polls. Management Science, 63(3), 691–706. https://doi.org/10.1287/mnsc.2015.2374
Bamber, J. L., Oppenheimer, M., Kopp, R. E., Aspinall, W. P., & Cooke, R. M. (2019). Ice sheet contributions to future sea-level rise from structured expert judgment. Proceedings of the National Academy of Sciences, 116(23), 11195–11200. https://doi.org/10.1073/pnas.1817205116
Baron, J. (1985). Rationality and intelligence. Cambridge University Press. https://doi.org/10.1017/CBO9780511571275
Baron, J., Mellers, B. A., Tetlock, P. E., Stone, E., & Ungar, L. H. (2014). Two reasons to make aggregated probability forecasts more extreme. Decision Analysis, 11(2), 133–145. https://doi.org/10.1287/deca.2014.0293
Benjamin, D., Mandel, D. R., & Kimmelman, J. (2017). Can cancer researchers accurately judge whether preclinical reports will reproduce? PLoS Biology, 15(6), 1–17. https://doi.org/10.1371/journal.pbio.2002212
Benjamin, D., Mandel, D. R., Barnes, T., Krzyzanowska, M. K., Leighl, N. B., Tannock, I. F., & Kimmelman, J. (2021). Can oncologists predict the efficacy of treatment in randomized trials? The Oncologist, 26, 56–62. https://doi.org/10.1634/theoncologist.2020-0054
Benjamin, D. M., Hey, S. P., MacPherson, A., Hachem, Y., Smith, K. S., Zhang, S. X., Wong, S., Dolter, S., Mandel, D. R., & Kimmelman, J. (2022). Principal investigators over-optimistically forecast scientific and operational outcomes for clinical trials. PLoS One, 17(2), e0262862. https://doi.org/10.1371/journal.pone.0262862
Bickel, J. E. (2007). Some comparisons among quadratic, spherical, and logarithmic scoring rules. Decision Analysis, 4(2), 49–65. https://doi.org/10.1287/deca.1070.0089
Bolger, F., & Wright, G. (1994). Assessing the quality of expert judgment. Decision Support Systems, 11(1), 1–24. https://doi.org/10.1016/0167-9236(94)90061-2
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3. https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
Broomell, S., & Budescu, D. V. (2009). Why are experts correlated? Decomposing correlations between judges. Psychometrika, 74(3), 531–553. https://doi.org/10.1007/s11336-009-9118-z
Budescu, D. V., & Chen, E. (2015). Identifying expertise to extract the wisdom of crowds. Management Science, 61(2), 267–280. https://doi.org/10.1287/mnsc.2014.1909
Budescu, D. V., Himmelstein, M., & Ho, E. (2021, October). Boosting the wisdom of crowds with social forecasts and coherence measures. In Presented at annual meeting of Society of Multivariate Experimental Psychology (SMEP), online.
Chen, E., Budescu, D. V., Lakshmikanth, S. K., Mellers, B. A., & Tetlock, P. E. (2016). Validating the contribution-weighted model: Robustness and cost-benefit analyses. Decision Analysis, 13(2), 128–152. https://doi.org/10.1287/deca.2016.0329
Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5(4), 559–583.
Clemen, R. T., & Winkler, R. L. (1999). Combining probability distributions from experts in risk analysis. Risk Analysis, 19(2), 187–203. https://doi.org/10.1111/j.1539-6924.1999.tb00399.x
Collins, R. N., Mandel, D. R., Karvetski, C. W., Wu, C. M., & Nelson, J. D. (in press). The wisdom of the coherent: Improving correspondence with coherence-weighted aggregation. Decision.
Colson, A. R., & Cooke, R. M. (2018). Expert elicitation: Using the classical model to validate experts’ judgments. Review of Environmental Economics and Policy, 12(1), 113–132. https://doi.org/10.1093/reep/rex022
Cooke, R. M. (1991). Experts in uncertainty: Opinion and subjective probability in science. Oxford University Press.
Cooke, R. M. (2014). Validating expert judgment with the classical model. In C. Martini & M. Boumans (Eds.), Experts and consensus in social science (Vol. 50, pp. 191–212). Springer. https://doi.org/10.1007/978-3-319-08551-7_10
Cooke, R. M., & Goossens, L. L. H. J. (2008). TU Delft expert judgment data base. Reliability Engineering & System Safety, 93(5), 657–674. https://doi.org/10.1016/j.ress.2007.03.005
Cooke, R., Mendel, M., & Thijs, W. (1988). Calibration and information in expert resolution; a classical approach. Automatica, 24(1), 87–93. https://doi.org/10.1016/0005-1098(88)90011-8
Davis-Stober, C. P., Budescu, D. V., Dana, J., & Broomell, S. B. (2014). When is a crowd wise? Decision, 1(2), 79–101. https://doi.org/10.1037/dec0000004
de Finetti, B. (1937). La prévision: Ses lois logiques, ses sources subjectives. Annales de l’Institut Henri Poincaré, 7, 1–68.
de Finetti, B. (1962). Does it make sense to speak of “good probability appraisers”? In I. J. Good (Ed.), The scientist speculates: An anthology of partly-baked ideas (pp. 357–363). Wiley.
Dietrich, F., & List, C. (2017). Probabilistic opinion pooling (Vol. 1). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199607617.013.37
Dunwoody, P. T. (2009). Theories of truth as assessment criteria in judgment and decision making. Judgment and Decision Making, 4(2), 116–125. https://doi.org/10.1017/S1930297500002540
Eggstaff, J. W., Mazzuchi, T. A., & Sarkani, S. (2014). The effect of the number of seed variables on the performance of Cooke’s classical model. Reliability Engineering & System Safety, 121, 72–82. https://doi.org/10.1016/j.ress.2013.07.015
Fan, Y., Budescu, D. V., Mandel, D., & Himmelstein, M. (2019). Improving accuracy by coherence weighting of direct and ratio probability judgments. Decision Analysis, 16(3), 197–217. https://doi.org/10.1287/deca.2018.0388
Galton, F. (1907). Vox Populi. Nature, 75(1949), 450–451. https://doi.org/10.1038/075450a0
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Goldstein, R., Almenberg, J., Dreber, A., Emerson, J. W., Herschkowitsch, A., & Katz, J. (2008). Do more expensive wines taste better? Evidence from a large sample of blind tastings. Journal of Wine Economics, 3(1), 1–9. https://doi.org/10.22004/ag.econ.37328
Hammond, K. R. (2000). Coherence and correspondence theories in judgment and decision making. In T. Connolly, K. Hammond, & H. Arkes (Eds.), Judgment and decision making: An interdisciplinary reader (2nd ed., pp. 53–65). Cambridge University Press.
Han, Y., & Budescu, D. (2019). A universal method for evaluating the quality of aggregators. Judgment and Decision Making, 14(4), 395–411. https://doi.org/10.1017/S1930297500006094
Hanea, A. M., McBride, M. F., Burgman, M. A., & Wintle, B. C. (2018). The value of performance weights and discussion in aggregated expert judgments. Risk Analysis, 38(9), 1781–1794. https://doi.org/10.1111/risa.12992
Hanea, A. M., Wilkinson, D. P., McBride, M., Lyon, A., van Ravenzwaaij, D., Singleton Thorn, F., Gray, C., Mandel, D. R., Willcox, A., Gould, E., Smith, E. T., Mody, F., Bush, M., Fidler, F., Fraser, H., & Wintle, B. C. (2021). Mathematically aggregating experts’ predictions of possible futures. PLoS One, 16(9), e0256919. https://doi.org/10.1371/journal.pone.0256919
Haran, U., Moore, D. A., & Morewedge, C. K. (2010). A simple remedy for overprecision in judgment. Judgment and Decision Making, 5, 467–476. https://doi.org/10.1017/S1930297500001637
Hastie, R., & Kameda, T. (2005). The robust beauty of majority rules in group decisions. Psychological Review, 112(2), 494–508. https://doi.org/10.1037/0033-295X.112.2.494
Hemming, V., Hanea, A. M., Walshe, T., & Burgman, M. A. (2020). Weighting and aggregating expert ecological judgments. Ecological Applications, 30(4), e02075. https://doi.org/10.1002/eap.2075
Herzog, S. M., & Hertwig, R. (2014). Harnessing the wisdom of the inner crowd. Trends in Cognitive Sciences, 18(10), 504–506. https://doi.org/10.1016/j.tics.2014.06.009
Himmelstein, M., Atanasov, P., & Budescu, D. V. (2021). Forecasting forecaster accuracy: Contributions of past performance and individual differences. Judgment and Decision Making, 16(2), 323–362. https://doi.org/10.1017/S1930297500008597
Himmelstein, M., Budescu, D. V., & Han, Y. (2022). The wisdom of timely crowds. In M. Seiffert (Ed.), Judgment and predictive analytics (1st ed.). Springer Nature.
Ho, E. H. (2020, June). Developing and validating a method of coherence-based judgment aggregation. Unpublished PhD Sissertation. Fordham University.
Jaspersen, J. G. (2021). Convex combinations in judgment aggregation. European Journal of Operational Research, 299, 780–794. https://doi.org/10.1016/j.ejor.2021.09.050
Jose, V. R. R., Grushka-Cocayne, Y., & Lichtendahl, K. C., Jr. (2013). Trimmed opinion pools and the crowd’s calibration problem. Management Science, 60(20), 463–475. https://doi.org/10.1287/mnsc.2013.1781
Kahneman, D., Rosenfield, A. M., Gandhi, L., & Blaser, T. (2016). How to overcome the high, hidden cost of inconsistent decision making. Harvard Business Review, 94, 36–43. Retrieved January 28, 2022, from https://hbr.org/2016/10/noise
Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A flaw in human judgment. Little, Brown Spark.
Karvetski, C. W., Olson, K. C., Mandel, D. R., & Twardy, C. R. (2013). Probabilistic coherence weighting for optimizing expert forecasts. Decision Analysis, 10(4), 305–326. https://doi.org/10.1287/deca.2013.0279
Karvetski, C. W., Mandel, D. R., & Irwin, D. (2020). Improving probability judgment in intelligence analysis: From structured analysis to statistical aggregation. Risk Analysis, 40(5), 1040–1057. https://doi.org/10.1111/risa.13443
Kolmogorov, A. N. (1956). Foundations of the theory of probability. (N. Morrison, Trans.; 2nd English Edition). Chelsea Publishing Company.
Larrick, R. P., & Soll, J. B. (2006). Intuitions about combining opinions: Misappreciation of the averaging principle. Management Science, 52(1), 111–127. https://doi.org/10.1287/mnsc.1050.0459
Larrick, R. P., Mannes, A. E., & Soll, J. B. (2011). The social psychology of the wisdom of crowds. In J. I. Krueger (Ed.), Social judgment and decision making (pp. 227–242). Psychology Press.
Lorenz, J., Rauhut, H., Schweitzer, F., & Helbing, D. (2011). How social influence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences, 108(22), 9020–9025. https://doi.org/10.1073/pnas.1008636108
Makridakis, S., & Winkler, R. L. (1983). Averages of forecasts: Some empirical results. Management Science, 29(9), 987–996. https://doi.org/10.1287/mnsc.29.9.987
Mandel, D. R., & Barnes, A. (2014). Accuracy of forecasts in strategic intelligence. Proceedings of the National Academy of Sciences, 111(30), 10984–10989. https://doi.org/10.1073/pnas.1406138111
Mandel, D. R., & Barnes, A. (2018). Geopolitical forecasting skill in strategic intelligence: Geopolitical forecasting skill. Journal of Behavioral Decision Making, 31(1), 127–137. https://doi.org/10.1002/bdm.2055
Mandel, D. R., & Kapler, I. V. (2018). Cognitive style and frame susceptibility in decision-making. Frontiers in Psychology, 9, 1461. https://doi.org/10.3389/fpsyg.2018.01461
Mandel, D. R., Karvetski, C. W., & Dhami, M. K. (2018). Boosting intelligence analysts’ judgment accuracy: What works, what fails? Judgment and Decision Making, 13(6), 607–621. https://doi.org/10.1017/S1930297500006628
Mannes, A. E., Soll, J. B., & Larrick, R. P. (2014). The wisdom of select crowds. Journal of Personality and Social Psychology, 107(2), 276–299. https://doi.org/10.1037/a0036677
Martins, J. R. R. A., & Ning, A. (2021). Engineering design optimization (1st ed.). Cambridge University Press. https://doi.org/10.1017/9781108980647
Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., Scott, S. E., Moore, D., Atanasov, P., Swift, S. A., Murray, T., Stone, E., & Tetlock, P. E. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological Science, 25(5), 1106–1115. https://doi.org/10.1177/0956797614524255
Mellers, B., Stone, E., Atanasov, P., Rohrbaugh, N., Metz, S. E., Ungar, L., Bishop, M. M., Horowitz, M., Merkle, E., & Tetlock, P. (2015). The psychology of intelligence analysis: Drivers of prediction accuracy in world politics. Journal of Experimental Psychology: Applied, 21(1), 1–14. https://doi.org/10.1037/xap0000040
Mellers, B. A., Baker, J. D., Chen, E., Mandel, D. R., & Tetlock, P. E. (2017). How generalizable is good judgment? A multi-task, multi-benchmark study. Judgment and Decision Making, 12(4), 369–381. https://doi.org/10.1017/S1930297500006240
Osherson, D., & Vardi, M. Y. (2006). Aggregating disparate estimates of chance. Games and Economic Behavior, 56(1), 148–173. https://doi.org/10.1016/j.geb.2006.04.001
Park, S., & Budescu, D. V. (2015). Aggregating multiple probability intervals to improve calibration. Judgment and Decision Making, 10(2), 130–143. https://doi.org/10.1017/S1930297500003910
Peterson, W., Birdsall, T., & Fox, W. (1954). The theory of signal detectability. Transactions of the IRE Professional Group on Information Theory, 4(4), 171–212. https://doi.org/10.1109/TIT.1954.1057460
Predd, J. B., Osherson, D. N., Kulkarni, S. R., & Poor, H. V. (2008). Aggregating probabilistic forecasts from incoherent and abstaining experts. Decision Analysis, 5(4), 177–189. https://doi.org/10.1287/deca.1080.0119
Predd, J. B., Seiringer, R., Lieb, E. H., Osherson, D. N., Poor, H. V., & Kulkarni, S. R. (2009). Probabilistic coherence and proper scoring rules. IEEE Transactions on Information Theory, 55(10), 4786–4792. https://doi.org/10.1109/TIT.2009.2027573
Rossi, F., van Beek, P., & Walsh, T. (2006). Chapter 1—Introduction. In F. Rossi, P. van Beek, & T. Walsh (Eds.), Foundations of artificial intelligence (Vol. 2, pp. 3–12). Elsevier. https://doi.org/10.1016/S1574-6526(06)80005-2
Satopää, V. A., Salikhov, M., Tetlock, P. E., & Mellers, B. (2021). Bias, information, noise: The BIN model of forecasting. Management Science, 67(12), 7599–7618. https://doi.org/10.1287/mnsc.2020.3882
Silver, N. (2012). The signal and the noise: Why so many predictions fail—But some don’t. Penguin.
Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. Doubleday & Co..
Tetlock, P. E. (2005). Expert political judgement: How good is it? How can we know? Princeton University Press.
Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The art and science of prediction. Crown Publishers/Random House.
Tump, A. N., Pleskac, T. J., & Kurvers, R. H. J. M. (2020). Wise or mad crowds? The cognitive mechanisms underlying information cascades. Science Advances, 6(29), 1–11. https://doi.org/10.1126/sciadv.abb0266
Turner, B. M., Steyvers, M., Merkle, E. C., Budescu, D. V., & Wallsten, T. S. (2014). Forecast aggregation via recalibration. Machine Learning, 95(3), 261–289. https://doi.org/10.1007/s10994-013-5401-4
Wallsten, T. S., & Budescu, D. V. (1983). State of the art—Encoding subjective probabilities: A psychological and psychometric review. Management Science, 29(2), 151–173. https://doi.org/10.1287/mnsc.29.2.151
Wallsten, T. S., & Diederich, A. (2001). Understanding pooled subjective probability estimates. Mathematical Social Sciences, 41(1), 1–18. https://doi.org/10.1016/S0165-4896(00)00053-6
Wang, G., Kulkarni, S. R., Poor, H. V., & Osherson, D. N. (2011a). Improving aggregated forecasts of probability. In 2011 45th annual conference on information sciences and systems (pp. 1–5). https://doi.org/10.1109/CISS.2011.5766208
Wang, G., Kulkarni, S. R., Poor, H. V., & Osherson, D. N. (2011b). Aggregating large sets of probabilistic forecasts by weighted coherent adjustment. Decision Analysis, 8(2), 128–144. https://doi.org/10.1287/deca.1110.0206
Weaver, E. A., & Stewart, T. R. (2012). Dimensions of judgment: Factor analysis of individual differences: Dimensions of judgment. Journal of Behavioral Decision Making, 25(4), 402–413. https://doi.org/10.1002/bdm.748
Weiss, D. J., Brennan, K., Thomas, R., Kirlik, A., & Miller, S. M. (2009). Criteria for performance evaluation. Judgment and Decision Making, 4(2), 164–174. https://doi.org/10.1017/S1930297500002606
Willmott, C., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30, 79–82. https://doi.org/10.3354/cr030079
Wright, G., & Ayton, P. (1987). Task influences on judgemental forecasting. Scandinavian Journal of Psychology, 28(2), 115–127. https://doi.org/10.1111/j.1467-9450.1987.tb00746.x
Yerushalmy, J. (1947). Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. Public Health Reports, 62(40), 1432–1449. https://doi.org/10.2307/4586294
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 © His Majesty the King in Right of Canada as represented by Department of National Defence (2023)
About this chapter
Cite this chapter
Collins, R.N., Mandel, D.R., Budescu, D.V. (2023). Performance-Weighted Aggregation: Ferreting Out Wisdom Within the Crowd. In: Seifert, M. (eds) Judgment in Predictive Analytics. International Series in Operations Research & Management Science, vol 343. Springer, Cham. https://doi.org/10.1007/978-3-031-30085-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-30085-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30084-4
Online ISBN: 978-3-031-30085-1
eBook Packages: Business and ManagementBusiness and Management (R0)