Advertisement

International Journal of Computer Vision

, Volume 116, Issue 2, pp 136–160 | Cite as

A Joint Gaussian Process Model for Active Visual Recognition with Expertise Estimation in Crowdsourcing

  • Chengjiang Long
  • Gang HuaEmail author
  • Ashish Kapoor
Article

Abstract

We present a noise resilient probabilistic model for active learning of a Gaussian process classifier from crowds, i.e., a set of noisy labelers. It explicitly models both the overall label noise and the expertise level of each individual labeler with two levels of flip models. Expectation propagation is adopted for efficient approximate Bayesian inference of our probabilistic model for classification, based on which, a generalized EM algorithm is derived to estimate both the global label noise and the expertise of each individual labeler. The probabilistic nature of our model immediately allows the adoption of the prediction entropy for active selection of data samples to be labeled, and active selection of high quality labelers based on their estimated expertise to label the data. We apply the proposed model for four visual recognition tasks, i.e., object category recognition, multi-modal activity recognition, gender recognition, and fine-grained classification, on four datasets with real crowd-sourced labels from the Amazon Mechanical Turk. The experiments clearly demonstrate the efficacy of the proposed model. In addition, we extend the proposed model with the Predictive Active Set Selection Method to speed up the active learning system, whose efficacy is verified by conducting experiments on the first three datasets. The results show our extended model can not only preserve a higher accuracy, but also achieve a higher efficiency.

Keywords

Active learning Crowdsourcing  Gaussian process classifiers 

Notes

Acknowledgments

Research reported in this publication was partly supported by the National Institute Of Nursing Research of the National Institutes of Health under Award Number R01NR015371. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work is also partly supported by US National Science Foundation Grant IIS 1350763, China National Natural Science Foundation Grant 61228303, GH’s start-up funds form Stevens Institute of Technology, a Google Research Faculty Award, a gift grant from Microsoft Research, and a gift grant from NEC Labs America.

References

  1. Ambati, V., Vogel, S., & Carbonell, J. (May 2010). Active learning and crowd-sourcing for machine translation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10).Google Scholar
  2. Branson, S., Perona, P., & Belongie, S. (November 2011). Strong supervision from weak annotation: Interactive training of deformable part models. In: Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain.Google Scholar
  3. Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., & Belongie, S. (September 2010). Visual recognition with humans in the loop. In: Proceedings of the European Conference on Computer Vision, Heraklion, Crete.Google Scholar
  4. Burl, M., & Perona, P. (1998). Using hierarchical shape models to spot keywords in cursive handwriting data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 23–28). IEEE.Google Scholar
  5. Burl, M., Leung, T. K., & Perona, P. (1995). Face localization via shape statistics. In: Proceedings of the First International Workshop on Automatic Face and Gesture Recognition (pp. 154–159). Zurich.Google Scholar
  6. Chen, S., Zhang, J., Chen, G., & Zhang, C. (2010). What if the irresponsible teachers are dominating? A method of training on samples and clustering on teachers. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI).Google Scholar
  7. Dekel, O., & Shamir, O. (2009). Good learners for evil teachers. In: Proceedings of the IEEE International Conference on Machine Learning. IEEE.Google Scholar
  8. Dekel, O., & Shamir, O. (2009). Vox populi: Collecting high-quality labels from a crowd. In: Proceedings of the 22nd Annual Conference on Learning Theory.Google Scholar
  9. Deng, J., Krause, J., & Fei-Fei, L. (June 2013). Fine-grained crowdsourcing for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE.Google Scholar
  10. Deng, J., Dong, W., Socher, R., Li, L. -J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 248–255), June 2009. IEEE.Google Scholar
  11. Donmez, P., Carbonell, J., & Schneider, J. (2009). Efficiently learning the accuracy of labeling sources for selective sampling. In: Special Interest Group on Knowledge Discovery in Data (SIGKDD).Google Scholar
  12. Donmez, P., Carbonell, J., & Schneider, J. (2010). A probabilistic framework to learn from multiple annotators with time-varying accuracy. In: Proceedings of the SIAM Conference on Data Mining (SDM). Philadelphia: SIAM.Google Scholar
  13. Ebert, S., Fritz, M., & Schiele, B. (2012). Ralf: A reinforced active learning formulation for object class recognition. In: Proceedings of the IEEE International Conference on Computer Vision. IEEE.Google Scholar
  14. Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (Oct. 2005). Learning object categories from google’s image search. In: Proceedings of the 10th International Conference on Computer Vision, Beijing.Google Scholar
  15. Gibbs, M., & Mackay, D. (2000). Variational gaussian process classifiers. IEEE Transactions on Neural Networks 11(6), 1458–1464.Google Scholar
  16. Groot, P., Birlutiu, A., & Heskes, T. (2011). Learning from multiple annotators with gaussian processes. In: Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2011 21st International Conference on Artificial Neural Networks, Part II, Espoo, June 14–17, 2011 (pp. 159–164).Google Scholar
  17. Henao, R., & Winther, O. (2010). Pass-gp: Predictive active set selection for gaussian processes. In: Proceedings of the Machine Learning for Signal Processing (MLSP), 2010 IEEE International Workshop (p. 148153).Google Scholar
  18. Henao, R., & Winther, O. (2012). Predictive active set selection methods for gaussian processes. Neurocomputing, 80, 10–18.CrossRefGoogle Scholar
  19. Hua, G., Long, C., Yang, M., & Gao, Y. (2013). Collaborative active learning of a kernel machine ensemble for recognition. In: Proceedings IEEE International Conference on Computer Vision (pp. 1209–1216). IEEEGoogle Scholar
  20. Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In: Proceedings IEEE International Conference on Computer Vision.Google Scholar
  21. Kapoor, A., Hua, G., Akbarzadeh, A., & Baker, S. (2009). Which faces to tag: Adding prior constraints into active learning. In: Proceedings IEEE International Conference on Computer Vision. IEEEGoogle Scholar
  22. Kim, H.-C., & Ghahramani, Z. (2006). Bayesian gaussian process classification with the EM-EP algorithm. IEEE Transactions Pattern Analysis and Machine Intelligence, 28(12), 1948–1959.Google Scholar
  23. Kim, H.-C., & Ghahramani, Z. (2008). Outlier robust gaussian process classification. In: Proceedings of the Structural, Syntactic, and Statistical Pattern Recognition. Joint IAPR International Workshops (SSPR/SPR) (pp. 896–905).Google Scholar
  24. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Neural Information Processing Systems, 1, 31–40.Google Scholar
  25. Lawrence, N. D., Seeger, M., & Herbrich, R. (2003). Fast sparse gaussian process methods: The informative vector machine. In: Advances in Neural Information Processing Systems (vol. 15, pp. 609–616). Cambridge: MIT Press.Google Scholar
  26. Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L., & Huang, T. (2011). Large-scale image classification: Fast feature extraction and svm training. In: Proceedings IEEE International Conference on Computer Vision. IEEEGoogle Scholar
  27. Liu, D., Hua, G., Viola, P., & Chen, T. (2008). Integrated feature selection and higher-order spatial feature extraction for object categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008 (pp. 1–8). IEEE.Google Scholar
  28. Long, C., Hua, G., & Kapoor, A. (December 2013). Active visual recognition with expertise estimation in crowdsourcing. In: Proceedings IEEE International Conference on Computer Vision. IEEEGoogle Scholar
  29. Loy, C., Hospedales, T., Xiang, T., & Gong, S. (2012). Stream-based joint exploration-exploitation active learning. In: Proceedings IEEE International Conference on Computer Vision. IEEEGoogle Scholar
  30. Minka, T. (2001). A family of algorithms for approximate Bayesian inference. Ph.D. Thesis. Cambridge: MIT.Google Scholar
  31. Naish-Guzman, A., & Holden, S.B. (2007). The generalized FITC approximation. In: Neural Information Processing Systems (NIPS) (pp. 1057–1064).Google Scholar
  32. Neal, R.M. (1997). Monte carlo implementation of gaussian process models for bayesian regression and classification. Technical Report CRGTR972, University of Toronto.Google Scholar
  33. Opper, M., & Winther, O. (1999). Gaussian processes for classification: Mean-field algorithms. Neural Computation, 12, 2000.Google Scholar
  34. Parikh, D. (November 2011). Recognizing jumbled images: The role of local and global information in image classification. In: Proceedings IEEE International Conference on Computer Vision. IEEEGoogle Scholar
  35. Parikh, D., & Zitnick, L. (June 2010). The role of features, algorithms and data in visual recognition. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. IEEEGoogle Scholar
  36. Parikh, D., & Zitnick, L. (June 2011). Finding the weakest link in person detectors. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. IEEEGoogle Scholar
  37. Parikh, D., Zitnick, C. L., & Chen, T. (2012). Exploring tiny images: The roles of appearance and contextual information for machine and human object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10), 1978–1991.CrossRefGoogle Scholar
  38. Patterson, G., Horn, G. V., Belongie, S., Perona, P., & Hays, J. (2013). Bootstrapping fine-grained classifiers: Active learning with a crowd in the loop. In: Proceedings Neural Information Processing Systems (NIPS) 2013 Crowd Workshops.Google Scholar
  39. Quinonero-candela, J., Rasmussen, C. E., & Herbrich, R. (2005). A unifying view of sparse approximate gaussian process regression. Journal of Machine Learning Research, 6, 2005.Google Scholar
  40. Rasmussen, C. E. (2006). Gaussian processes for machine learning. Cambridge, MA: MIT Press.Google Scholar
  41. Raykar, V. C., & Yu, S. (2012). Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research, 13, 491–518 .Google Scholar
  42. Raykar, V. C., Yu, S., Zhao, L. H., Jerebko, A., Florin, C., Valadez, G. H., Bogoni, L., & Moy, L. (2009). Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: Proceedings IEEE International Conference on Machine Learning. IEEEGoogle Scholar
  43. Rodrigues, F., Pereira, F. C., & Ribeiro, B. (2013). Learning from multiple annotators: Distinguishing good from random labelers. Pattern Recognition Letters, 34(12), 1428–1436.CrossRefGoogle Scholar
  44. Rodrigues, F., Pereira, F., & Ribeiro, B. (2014). Gaussian process classification and active learning with multiple annotators. In: Proceedings IEEE International Conference on Machine Learning. IEEEGoogle Scholar
  45. Rodrigues, F., Pereira, F.C., & Ribeiro, B. (2013). Sequence labeling with multiple annotators. Machine Learning, 95(2), 165–181.Google Scholar
  46. Roy, N., & Mccallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In: Proceedings IEEE International Conference on Machine Learning, pp. 441–448. Burlington, MA: Morgan Kaufmann.Google Scholar
  47. Sanchez, J., & Perronnin, F. (2011). High-dimensional signature compression for large-scale image classification. In: Proceedings IEEE International Conference on Computer Vision. IEEEGoogle Scholar
  48. Seeger, M. (2002). Pac-Bayesian generalisation error bounds for gaussian process classification. Journal of Machine Learning Research, 3, 233–269.CrossRefMathSciNetGoogle Scholar
  49. Seeger, M., Williams, C. K. I., & Lawrence, N. D. (2003). Fast forward selection to speed up sparse gaussian process regression. In: Proceedings of the Workshop on Artificial Intelligence and Statistics, (vol. 9).Google Scholar
  50. Simpson, E., Roberts, S. J., Psorakis, I., & Smith, A. (2013). Dynamic bayesian combination of multiple imperfect classifiers. In: Decision Making and Imperfection (pp. 1–35). Berlin: SpringerGoogle Scholar
  51. Snelson, E., & Ghahramani Z. (2006). Sparse gaussian processes using pseudo-inputs. In: Advances in Neural information Processing Systems (pp. 1257–1264). Cambridge: MIT Press.Google Scholar
  52. Snelson, E., & Ghahramani, Z. (2006). Variable noise and dimensionality reduction for sparse gaussian processes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). Edinburgh: AUAI Press.Google Scholar
  53. Spriggs, E. H., Torre, F. D. L., & Hebert, M. (2009). Temporal segmentation and activity classification from first-person sensing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop. IEEEGoogle Scholar
  54. Thouless, D. J., Anderson, P. W., & Palmer, R. G. (1977). Solution of a “solvable model of a spin glass”. Philosophical Magazine, 35, 593.CrossRefGoogle Scholar
  55. Titsias, M. K. (2009). Variational learning of inducing variables in sparse gaussian processes. Artificial Intelligence and Statistics, 12, 567–574.Google Scholar
  56. Tivive, F. H. C., & Bouzerdoum, A. (2006). A gender recognition system using shunting inhibitory convolutional neural networks. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2006, Part of the IEEE World Congress on Computational Intelligence, WCCI 2006, Vancouver, BC, 16–21 July 2006, (pp. 5336–5341). IEEEGoogle Scholar
  57. Vijayanarasimhan, S., & Grauman, K. (2014). Large-scale live active learning: Training object detectors with crawled data and crowds. International Journal of Computer Vision (IJCV), 108(1–2), 97–114.CrossRefMathSciNetGoogle Scholar
  58. von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In: Proceedings ACM Conference on Human Factors in Computing Systems (pp. 319–326). New York, NY: ACMGoogle Scholar
  59. von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In: Proceedings ACM Conference on Human Factors in Computing Systems (pp. 55–64). New York, NY: ACMGoogle Scholar
  60. Vondrick, C., & Ramanan, D. (2011). Video annotation and tracking with active learning. In: Neural Information Processing Systems (NIPS) (pp. 28–36). Cambridge, MA: MIT PressGoogle Scholar
  61. Wah, C., Branson, S., Perona, P., & Belongie, S. (November 2011). Multiclass recognition and part localization with humans in the loop. In: Proceedings IEEE International Conference on Computer Vision, Barcelona, Spain. IEEEGoogle Scholar
  62. Welinder, P., & Perona, P. (June 2010). Online crowdsourcing: rating annotators and obtaining cost-effective labels. San Francisco. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. IEEEGoogle Scholar
  63. Welinder, P., & Perona., P. (2010). Online crowdsourcing: Rating annotators and obtaining cost-effective labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop. IEEEGoogle Scholar
  64. Welinder, P., Branson, S., Belongie, S., & Perona, P. (2010). The multidimensional wisdom of crowds. In: Neural Information Processing Systems (NIPS).Google Scholar
  65. Williams, C., & Barber, D. (1998). Bayesian classification with gaussian processes. IEEE Trans Pattern Analysis and Machine Intelligence, 20(12), 1342–1351.CrossRefGoogle Scholar
  66. Wu, O., Hu, W., & Gao, J. (2011). Learning to rank under multiple annotators. In: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Two (pp. 1571-1576). Menlo Park, CA: AAAI PressGoogle Scholar
  67. Yan, F., & Qi, Y. A. (2010). Sparse gaussian process regression via l1 penalization. In: Proceedings IEEE International Conference on Machine Learning (pp. 1183–1190). IEEEGoogle Scholar
  68. Yan, Y., Rosales, R., Fung, G., & Dy, J. G. (2011). Active learning from crowds. In: Proceedings IEEE International Conference on Machine Learning (pp. 1161–1168). IEEEGoogle Scholar
  69. Yan, Y., Rosales, R., Fung, G., & Dy, J. (2012). Active learning from multiple knowledge sources. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar
  70. Yao, A., Gall, J., Leistner, C., & Van Gool, L. (2012). Interactive object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEEGoogle Scholar
  71. Yao, B., Khosla, A., & Fei-Fei, L. (2011). Combining randomization and discrimination for fine-grained image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Springs, Colorado, June 2011. IEEEGoogle Scholar
  72. Zhang, Z., Dai, G., & Jordan, M. I. (2011). Bayesian generalized kernel mixed models. Journal of Machine Learning Research, 12, 111–139.MathSciNetGoogle Scholar
  73. Zhao, L., Sukthankar, G., & Sukthankar, R. (2011). Incremental relabeling for active learning with noisy crowdsourced annotations. In: Proceedings of the IEEE Third International Conference on and 2011 IEEE Third International Conference on Social Computing (SocialCom). IEEEGoogle Scholar
  74. Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining (pp. 58–65).Google Scholar
  75. Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23, 550–560.CrossRefMathSciNetzbMATHGoogle Scholar
  76. Zitnick, L., & Parikh, D. (2012). The role of image understanding in contour detection. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (pp. 622–629). IEEEGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Stevens Institute of TechnologyHobokenUSA
  2. 2.Microsoft ResearchRedmondUSA

Personalised recommendations