Advertisement

Bayesian Heatmaps: Probabilistic Classification with Multiple Unreliable Information Sources

  • Edwin SimpsonEmail author
  • Steven Reece
  • Stephen J. Roberts
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10535)

Abstract

Unstructured data from diverse sources, such as social media and aerial imagery, can provide valuable up-to-date information for intelligent situation assessment. Mining these different information sources could bring major benefits to applications such as situation awareness in disaster zones and mapping the spread of diseases. Such applications depend on classifying the situation across a region of interest, which can be depicted as a spatial “heatmap”. Annotating unstructured data using crowdsourcing or automated classifiers produces individual classifications at sparse locations that typically contain many errors. We propose a novel Bayesian approach that models the relevance, error rates and bias of each information source, enabling us to learn a spatial Gaussian Process classifier by aggregating data from multiple sources with varying reliability and relevance. Our method does not require gold-labelled data and can make predictions at any location in an area of interest given only sparse observations. We show empirically that our approach can handle noisy and biased data sources, and that simultaneously inferring reliability and transferring information between neighbouring reports leads to more accurate predictions. We demonstrate our method on two real-world problems from disaster response, showing how our approach reduces the amount of crowdsourced data required and can be used to generate valuable heatmap visualisations from SMS messages and satellite images.

Notes

Acknowledgments

We thank Brooke Simmons at Planetary Response Network for invaluable support and data. This work was funded by EPSRC ORCHID programme grant (EP/I011587/1).

References

  1. 1.
    Adams, R.P., Murray, I., MacKay, D.J.: Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 9–16. ACM (2009)Google Scholar
  2. 2.
    Corbane, C., Saito, K., Dell’Oro, L., Bjorgo, E., Gill, S.P., Emmanuel Piard, B., Huyck, C.K., Kemper, T., Lemoine, G., Spence, R.J., et al.: A comprehensive analysis of building damage in the 12 January 2010 MW7 Haiti earthquake using high-resolution satellite and aerial imagery. Photogramm. Eng. Remote Sens. 77(10), 997–1009 (2011)CrossRefGoogle Scholar
  3. 3.
    Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 28(1), 20–28 (1979)Google Scholar
  4. 4.
    Felt, P., Ringger, E.K., Seppi, K.D.: Semantic annotation aggregation with conditional crowdsourcing models and word embeddings. In: International Conference on Computational Linguistics, pp. 1787–1796 (2016)Google Scholar
  5. 5.
    Girolami, M., Rogers, S.: Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Comput. 18(8), 1790–1817 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Hensman, J., Matthews, A.G.d.G., Ghahramani, Z.: Scalable variational Gaussian process classification. In: International Conference on Artificial Intelligence and Statistics (2015)Google Scholar
  7. 7.
    Kim, H., Ghahramani, Z.: Bayesian classifier combination. Gatsby Computational Neuroscience Unit, Technical report GCNU-T, London, UK (2003)Google Scholar
  8. 8.
    Kom Samo, Y.L., Roberts, S.J.: Scalable nonparametric Bayesian inference on point processes with Gaussian processes. In: International Conference on Machine Learning, pp. 2227–2236 (2015)Google Scholar
  9. 9.
    Kottas, A., Sansó, B.: Bayesian mixture modeling for spatial poisson process intensities, with applications to extreme value analysis. J. Stat. Plan. Inference 137(10), 3151–3163 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Lintott, C.J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., Raddick, M.J., Nichol, R.C., Szalay, A., Andreescu, D., et al.: Galaxy Zoo: morphologies derived from visual inspection of galaxies from the sloan digital sky survey. Mon. Not. R. Astron. Soc. 389(3), 1179–1189 (2008)CrossRefGoogle Scholar
  11. 11.
    Long, C., Hua, G., Kapoor, A.: A joint Gaussian process model for active visual recognition with expertise estimation in crowdsourcing. Int. J. Comput. Vis. 116(2), 136–160 (2016)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Meng, C., Jiang, W., Li, Y., Gao, J., Su, L., Ding, H., Cheng, Y.: Truth discovery on crowd sensing of correlated entities. In: 13th ACM Conference on Embedded Networked Sensor Systems, pp. 169–182. ACM (2015)Google Scholar
  13. 13.
    Moreno, P.G., Teh, Y.W., Perez-Cruz, F.: Bayesian nonparametric crowdsourcing. J. Mach. Learn. Res. 16, 1607–1627 (2015)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Morrow, N., Mock, N., Papendieck, A., Kocmich, N.: Independent evaluation of the Ushahidi Haiti Project. Dev. Inf. Syst. Int. 8, 2011 (2011)Google Scholar
  15. 15.
    Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning, vol. 38, pp. 715–719. The MIT Press, Cambridge (2006)Google Scholar
  17. 17.
    Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13, 491–518 (2012)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Reece, S., Roberts, S., Nicholson, D., Lloyd, C.: Determining intent using hard/soft data and Gaussian process classifiers. In: 14th International Conference on Information Fusion, pp. 1–8. IEEE (2011)Google Scholar
  19. 19.
    Rosenblatt, M., et al.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27(3), 832–837 (1956)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Simpson, E., Roberts, S., Psorakis, I., Smith, A.: Dynamic Bayesian combination of multiple imperfect classifiers. In: Guy, T., Karny, M., Wolpert, D. (eds.) Decision Making and Imperfection, vol. 474, pp. 1–35. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-36406-8_1 CrossRefGoogle Scholar
  21. 21.
    Simpson, E.D., Venanzi, M., Reece, S., Kohli, P., Guiver, J., Roberts, S.J., Jennings, N.R.: Language understanding in the wild: combining crowdsourcing and machine learning. In: 24th International Conference on World Wide Web, pp. 992–1002 (2015)Google Scholar
  22. 22.
    Steinberg, D.M., Bonilla, E.V.: Extended and unscented Gaussian processes. In: Advances in Neural Information Processing Systems, pp. 1251–1259 (2014)Google Scholar
  23. 23.
    Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based Bayesian aggregation models for crowdsourcing. In: 23rd International Conference on World Wide Web, pp. 155–164 (2014)Google Scholar
  24. 24.
    Venanzi, M., Guiver, J., Kohli, P., Jennings, N.R.: Time-sensitive Bayesian information aggregation for crowdsourcing systems. J. Artif. Intell. Res. 56, 517–545 (2016)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Venanzi, M., Rogers, A., Jennings, N.R.: Crowdsourcing spatial phenomena using trust-based heteroskedastic Gaussian processes. In: 1st AAAI Conference on Human Computation and Crowdsourcing (HCOMP) (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Edwin Simpson
    • 1
    • 2
    Email author
  • Steven Reece
    • 2
  • Stephen J. Roberts
    • 2
  1. 1.Ubiquitous Knowledge Processing Lab, Department of Computer ScienceTechnische Universität DarmstadtDarmstadtGermany
  2. 2.Department of Engineering ScienceUniversity of OxfordOxfordUK

Personalised recommendations