Abstract
Most current vision algorithms deliver their output ‘as is’, without indicating whether it is correct or not. In this paper we propose evaluator algorithms that predict if a vision algorithm has succeeded. We illustrate this idea for the case of Human Pose Estimation (HPE).
We describe the stages required to learn and test an evaluator, including the use of an annotated ground truth dataset for training and testing the evaluator (and we provide a new dataset for the HPE case), and the development of auxiliary features that have not been used by the (HPE) algorithm, but can be learnt by the evaluator to predict if the output is correct or not.
Then an evaluator is built for each of four recently developed HPE algorithms using their publicly available implementations: Eichner and Ferrari [5], Sapp et al. [16], Andriluka et al. [2] and Yang and Ramanan [22]. We demonstrate that in each case the evaluator is able to predict if the algorithm has correctly estimated the pose or not.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aggarwal, G., Biswas, S., Flynn, P.J., Bowyer, K.W.: Predicting performance of face recognition systems: An image characterization approach. In: IEEE Conf. on Comp. Vis. and Pat. Rec. Workshops, pp. 52–59. IEEE Press (2011)
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE Conf. on Comp. Vis. and Pat. Rec. IEEE Press (2009)
Boshra, M., Bhanu, B.: Predicting performance of object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(9), 956–969 (2000)
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: IEEE Int. Conf. on Comp. Vis., pp. 1365–1372. IEEE Press (2009)
Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: Cavallaro, A., Prince, S., Alexander, D. (eds.) Proceedings of the British Machine Vision Conference, pp. 3:1–3:3. BMVA Press (2009)
Eichner, M., Marin, M., Zisserman, A., Ferrari, V.: Articulated human pose estimation and search in (almost) unconstrained still images. ETH Technical report (2010)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2) (2010)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conf. on Comp. Vis. and Pat. Rec. IEEE Press (2008)
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. International Journal of Computer Vision 61(1), 55–79 (2005)
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Pose search: Retrieving people using their pose. In: IEEE Conf. on Comp. Vis. and Pat. Rec. IEEE Press (2009)
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Conf. on Comp. Vis. and Pat. Rec. IEEE Press (2008)
Mac Aodha, O., Brostow, G.J., Pollefeys, M.: Segmenting video into classes of algorithm-suitability. In: IEEE Conf. on Comp. Vis. and Pat. Rec., pp. 1054–1061. IEEE Press (2010)
Pishchulin, L., Jain, A., Andriluka, M., Thormahlen, T., Schiele, B.: Articulated people detection and pose estimation: Reshaping the future. In: IEEE Conf. on Comp. Vis. and Pat. Rec., pp. 3178–3185. IEEE Press (2012)
Ramanan, D.: Learning to parse images of articulated bodies. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 1129–1136. MIT Press, Cambridge (2007)
Ronfard, R., Schmid, C., Triggs, B.: Learning to Parse Pictures of People. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 700–714. Springer, Heidelberg (2002)
Sapp, B., Toshev, A., Taskar, B.: Cascaded Models for Articulated Pose Estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 406–420. Springer, Heidelberg (2010)
Scheirer, W.J., Bendale, A., Boult, T.E.: Predicting biometric facial recognition failure with similarity surfaces and support vector machines. In: IEEE Conf. on Comp. Vis. and Pat. Rec. Workshops, pp. 1–8. IEEE Press (2008)
Viola, P., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)
Wang, P., Ji, Q., Wayman, J.L.: Modeling and predicting face recognition system performance based on analysis of similarity scores. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(4), 665–670 (2007)
Wang, R., Bhanu, B.: Learning models for predicting recognition performance. In: IEEE Int. Conf. on Comp. Vis., vol. 2, pp. 1613–1618. IEEE Press (2005)
Willems, G., Becker, J.H., Tuytelaars, T., Van Gool, L.: Exemplar-based action recognition in video. In: Cavallaro, A., Prince, S., Alexander, D. (eds.) Proceedings of the British Machine Vision Conference, pp. 90.1–90.11. BMVA Press (2009)
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conf. on Comp. Vis. and Pat. Rec. IEEE Press (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jammalamadaka, N., Zisserman, A., Eichner, M., Ferrari, V., Jawahar, C.V. (2012). Has My Algorithm Succeeded? An Evaluator for Human Pose Estimators. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33712-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-33712-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33711-6
Online ISBN: 978-3-642-33712-3
eBook Packages: Computer ScienceComputer Science (R0)