Abstract
Object detection is challenging when the object class exhibits large within-class variations. In this work, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly learned in a multiplicative form of two kernel functions. Model training is accomplished via standard SVM learning. When the foreground object masks are provided in training, the detectors can also produce object segmentations. A tracking-by-detection framework to recover foreground state in video sequences is also proposed with our model. The advantages of our method are demonstrated on tasks of object detection, view angle estimation and tracking. Our approach compares favorably to existing methods on hand and vehicle detection tasks. Quantitative tracking results are given on sequences of moving vehicles and human faces.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In this paper, all vector variables are column vectors.
- 2.
available at http://cs-people.bu.edu/yq/projects/mk.html.
- 3.
available at http://cs-people.bu.edu/yq/projects/mk.html.
References
Agarwal A, Triggs B (2004) 3D human pose from silhouettes by relevance vector regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Andriluka M, Roth S, Schiele B (2008) People-tracking-by-detection and people-detection-by-tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Athitsos V, Sclaroff S (2003) Estimating 3D hand pose from a cluttered image. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(24):509–522
Bissacco A, Yang M, Soatto S (2006) Deteing humans via their pose. In: Proceedings of advances in neural information processing systems
Blaschko MB, Lampert CH (2008) Learning to localize objects with structured output regression. In: Proceedings of the European conference on computer vision
Borenstein E, Ullman S (2002) Class-specific, top-down segmentation. In: Proceedings of the European conference on computer vision
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
Crasborn O, van der Kooij E, Nonhebel A, Emmerik W (2004) ECHO data set for sign language of the Netherlands. Technical report Department of Linguistics, University Nijmegen
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Damoulas T, Girolami MA (2008) Pattern recognition with a Bayesian kernel combination machine. Pattern Recogn Lett 30(1):46–54
Enzweiler M, Gavrila DM (2008) A mixed generative-discriminative framework for pedestrian classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell (to appear)
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vision 61:55–79
Gavrila DM (2000) Pedestrian detection from a moving vehicle. In: Proceedings of the European conference on computer vision
Gross R, Matthews I, Cohn J, Kanade T, Baker S (2008) Multi-PIE. In: Proceedings of the IEEE international conference on face and gesture recognition
Hoiem D, Efros AA, Hebert M (2008) Putting objects in perspective. Int J Comput Vision 80(1):3–15
Huang C, Ai H, Li Y, Lao S (2007) High-performance rotation invariant multiview face detection. IEEE Trans Pattern Anal Mach Intell 29(4):671–686
Ioffe C, Forsyth D (2001) Probabilistic methods for finding people. Int J Comput Vision 43(1):45–68
Ionescu C, Bo L, Sminchisescu C (2009) Structural SVM for visual localization and continuous state estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Isard M, Blake A (1998) CONDENSATION: Conditional density propagation for visual tracking. Int J Comput Vision 29(1):5–28
Joachims T (1999) Making large-scale SVM learning practical. In: Scholkopf B, Burges C, Smola A (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge
Kumar MP, Torr PHS, Zisserman A (2005) Obj Cut. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Leibe B, Cornelis N, Cornelis K, Gool LV (2007) Dynamic 3D scene analysis from a moving vehicle. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Leibe B, Leonardis A, Schiele B (2007) Robust object detection with interleaved categorization and segmentation. Int J Comput Vision 77(1):259–289
Li S, Fu Q, Gu L, Scholkopf B, Cheng Y, Zhang H (2001) Kernel machine based learning for multi-view face detection and pose estimation. In: Proceedings of the IEEE international conference on computer vision
Li S, Zhang Z (2004) Floatboost learning and statistical face detection. IEEE Trans Pattern Anal Mach Intell 26(9):1112–1123
Li Y, Ai H, Yamashita T, Lao S, Kawade M (2008) Tracking in low frame rate video: a cascade particle filter with discriminative observers of different life spans. IEEE Trans Pattern Anal Mach Intell 30(10):1728–1740
Everingham M et al (2006) The 2005 PASCAL visual object class challenge. In: Machine learning challenges—evaluating predictive uncertainty, visual object classification, and recognising textual entailment, Springer
Marszalek M, Schmid C, Harzallah H, van de Weijer J (2007) Learning object representations for visual object class recognition. In: Visual recognition challange workshop, in conjunction with ICCV
Murase H, Nayar SK (1995) Visual learning and recognition of 3D objects from appearance. Int J Comput Vision 14(1):5–24
Neidle C (2003) ASLLRP signstream databases. Boston University, Boston. http://ling.bu.edu/asllrpdata/queryPages
Nocedal J, Wright SJ (2006) Numerical optimization. Springer, New York
Oikonomopoulos A, Patras I, Pantic M (2006) Kernel-based recognition of human actions using spatiotemporal salient points. In: Workshop on vision for human computer interaction
Okuma K, Taleghani A, Freitas ND, Little J, Lowe D (2004) A boosted particle filter: multitarget detection and tracking. In: Proceeedings of the European conference on computer vision
Ong E, Bowden R (2004) A boosted classifier tree for hand shape detection. In: Proceedings of the IEEE international conference on face and gesture recognition
Osadchy R, Miller M, LeCun Y (2004) Synergistic face detection and pose estimation with energy-based model. In: Proceedings of advances in neural information processing systems
Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vision 38(1):15–33
Pentland A, Moghaddam B, Starner T (1994) View-based and modular eigenspaces for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola A, Bartlett P, Scholkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge
Ramanan D, Forsyth DA, Zisserman A (2005) Strike a pose: tracking people by finding stylized poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
Rosales R, Sclaroff S (2002) Learning body pose via specialized maps. In: Proceedings of advances in neural information processing systems
Russell BC, Torralba A, Murphy KP, Freeman WT (2005) LabelMe: a database and web-based tool for image annotation. Technical report, MIT Press, Cambridge
Seemann E, leibe B, Schiele B (2006) Multi-aspect detection of articulated objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Shakhnarovich G, Viola P, Darrell T (2003) Fast pose estimation with parameter-sensitive hashing. In: Proceedings of the IEEE international conference on computer vision
Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Sidenbladh H, Black MJ, Fleet DJ (2000) Stochastic tracking of 3D human figures using 2D image motion. In: Proceedings of the European conference on computer vision, pp 702–718
Sigal L, Bhatia S, Roth S, Black M, Isard M (2004) Tracking loose-limbed people. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Sminchisescu C, Kanaujia A, Metaxas D (2006) Learning joint top-down and bottom-up processes for 3D visual inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Stenger B, Thayananthan A, Torr P, Cipolla R (2003) Filtering using a tree-based estimator. In: Proceedings of the IEEE international conference on computer vision
Torralba A, Murphy K, Freeman W (2004) Sharing features: Efficient boosting procedures for multiclass object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Varma M, Ray D (2007) Learning the discriminative power-invariance trade-off. In: Proceedings of the IEEE international conference on computer vision. Rio de Janeiro, Brazil
Viola P, Jones M (2003) Fast multi-view face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Viola P, Jones M (2004) Robust real time object detection. Int J Comput Vision 57(2):137–154
Wang L, Shi J, Song G, Shen I (2007) Object detection combining recognition and segmentation. In: Proceedings of Asian conference on computer vision
Wu B, Nevatia R (2007) Cluster boosted tree classifier for multi-view multi-pose object detection. In: Proceedings of the IEEE international conference on computer vision
Wu B, Nevatia R (2007) Simultaneous object detection and segmentation by boosting local shape feature based classifier. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Yuan Q, Thangali A, Ablavsky V, Sclaroff S (2007) Parameter sensitive detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhu L, Chen Y, Lin C, Yuille AL (2007) Rapid inference on a novel and/or graph: detection, segmentation and parsing of articulated deformable objects in cluttered backgrounds. In: Proceedings of advances in neural information processing systems
Acknowledgments
This paper reports work that was supported in part by the U.S. National Science Foundation under grants IIS-0705749 and IIS-0713168.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Yuan, Q., Thangali, A., Ablavsky, V., Sclaroff, S. (2013). Learning a Family of Detectors via Multiplicative Kernels. In: Tavares, J., Natal Jorge, R. (eds) Topics in Medical Image Processing and Computational Vision. Lecture Notes in Computational Vision and Biomechanics, vol 8. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0726-9_1
Download citation
DOI: https://doi.org/10.1007/978-94-007-0726-9_1
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-0725-2
Online ISBN: 978-94-007-0726-9
eBook Packages: EngineeringEngineering (R0)