Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis.
The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.
- Abramson, Y. and Freund, Y. 2005. Semi-automatic visual learning (Seville): a tutorial on active learning for visual object recognition, http://caor.ensmp.fr/∼abramson/sevilleCVPR/.
- Bose, B. and Grimson, E. 2004. Improving object classification in far-field video. In Proc. CVPR, 2:181–188.
- Buxton, H., Gong, S.G. (1995) Visual surveillance in a dynamic and uncertain world. Artificial Intelligence 78: pp. 431-459 CrossRef
- Cox, I.J. and Leonard, J.J. 1994. Modeling a dynamic environment using a bayesian multiple hypothesis approach, Artificial Intelligence, 66(2):311–344.
- Elder, J.H., Dornaika, F., Hou, Y., Goldstein, R. Attentive wide-field sensing for visual telepresence and surveillance. In: Itti, L., Rees, G., Tsotsos, J. eds. (2005) Neurobiology of Attention. Academic Press/Elsevier, San Diego, CA
- Elder, J.H., Krupnik, A., Johnston, L.A. (2003) Contour grouping with prior models. IEEE Transactions on Pattern Analysis and Machine Intelligence 25: pp. 661-674 CrossRef
- Friedman, N. and Russel, S. 1997. Image segmentation in video sequences: a probabilistic approach. In Proc. UAI, 175–181.
- Green, D.M., Swets, J.A. (1966) Signal detection theory and psychophysics. Wiley, New York
- Greiffenhagen, M., Ramesh, V., Comaniciu, D. and Niemann, H. 2000. Statistical modeling and performance characterization of a real-time dual camera surveillance system. In Proc. CVPR, 335–342.
- Haritaoglu, I., HArwood, D. and Davis, L.S. 2000. W4: Real-time surveillance of people and their activities, IEEE PAMI, 22(8):809–830.
- Hayman, E. and Eklundh, J.O. 2002. Probabilistic and voting approaches to cue integration for figure-ground segmentation. In European Conference on Computer Vision, of Lecture Notes in Computer Science, 2352:469–486.
- Hess, R.F. and Dakin, S.C. 1997. Absence of contour linking in peripheral vision. Nature, 390:602–604. Letters to Nature.
- Ikeda, H., Blake, R., Watanabe, K. (2005) Eccentric perception of biological motion is unscalably poor. Vision Research 45: pp. 1935-1943 CrossRef
- Isard, M., Blake, A. (1998) Condensation: conditional density propagation for visual tracking. International Journal of Computer Vision 29: pp. 5-28 CrossRef
- Itti, L. (2005) Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Visual Cognition 12: pp. 1093-1123 CrossRef
- Izenman, A.J. (1991) Recent developments in nonparametric density estimation. Journal of the American Statistical Association 86: pp. 205-224 CrossRef
- Johnston, A., Wright, M.J. (1985) Lower thresholds of motion for gratings as a function of eccentricity and contrast. Vision Research 25: pp. 179-185 CrossRef
- Jones, M.J. and Rehg, J.M. 1999. Statistical color models with application to skin detection. In Proc. CVPR, 274–280.
- Kruppa, H., Santana, M.C. and Schiele, B. 2003. Fast and robust face finding via local context. In Proc. VS-PETS, 157–164.
- Lienhart, R. and Maydt, J. 2002. An extended set of Haar-like features for rapid object detection. In IEEE International Conference on Image Processing, 900–903.
- Marchesotti, L., Marcenaro, L. and Regazzoni, C. 2003. Dual camera system for face detection in unconstrained environments. In Proc. ICIP, 1:681–684.
- Miller, M.I., Grenander, U., O’Sullivan, J.A., Synder, D.L. (1997) Automatic target recognition organized via jump-diffusion algorithms. IEEE Transactions on Image Processing 6: pp. 157-174 CrossRef
- Nair, V. and Clark, J.J. 2004. An unsupervised, online learning framework for moving object detection. In Proc. CVPR, 2:317–324.
- Parkhurst, D., Law, K., Niebur, E. (2002) Modeling the role of salience in the allocation of overt visual attention. Vision Research 42: pp. 107-123 CrossRef
- Rovamo, J., Iivanainen, A. (1991) Detection of chromatic deviations from white across the human visual field. Vision Research 31: pp. 2227-2234 CrossRef
- Scassellati, B. 1998. Eye finding via face detection for a foveated active vision system. In AAAI/IAAI, 969–976.
- Schneiderman, H. 2004. Feature-centric evaluation for efficient cascaded object detection. In Proc. CVPR, 2:29–36.
- Schneiderman, H., Kanade, T. (2004) Object detection using the statistic of parts. International Journal of Computer Vision 56: pp. 151-177 CrossRef
- Sherrah, J. and Gong, S. 2001, Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects. In Proceedings of the International Conference on Computer Vision, II:42–49.
- Sidenbladh, H., Black, M.J. (2003) Learning the statistics of people in images and video. International Journal of Computer Vision 54: pp. 183-209
- Spengler, M. and Schiele, B. 2001. Towards robust multi-cue integration for visual tracking. In International Conference on Vision Systems, Berlin, 2001, vol. 2095 of Lecture Notes in Computer Science, pp. 93–106, Springer-Verlag.
- Sullivan, J., Blake, A., Isard, M., MacCormick, J. (2001) Bayesian object localisation in images. International Journal of Computer Vision 44: pp. 111-135 CrossRef
- Triesch, J., von der Malsburg, C. (2001) Democratic integration: self-organized integration of adaptive cues. Neural Computation 13: pp. 2049-2074 CrossRef
- Triesch, J., von der Malsburg, C. (2001) A system for person-independent hand posture recognition against complex backgrounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 23: pp. 1449-1453 CrossRef
- Toyama, K. and Horvitz, E. 2000. Bayesian modality fusion: probabilistic integration of multiple vision algorithms for head tracking. In Fourth Asian Conference on Computer Vision.
- Velisavljevic, L. and Elder, J.H. 2002. What do we see in a glance? [abstract]. Journal of Vision, 2(7):493.
- Velisavljevic, L. and Elder, J.H. 2003. Eccentricity effects in the rapid visual encoding of natural images [abstract], Journal of Vision, 3(9):647a.
- Viola, P. and Jones, M.J. 2001. Rapid object detection using a boosted cascade of simple features, In Proc. CVPR, 1:511–518.
- Viola, P., Jones, M.J., Snow, D. (2003) Detecting pedestrians using patterns of motion and appearance. Proc. ICCV 2: pp. 734-741
- Xiong, Q., Jaynes, C.O. (2003) Mugshot database acquisition in video surveillance networks using incremental auto-clustering quality measures. Proc. AVSS. IEEE, Computer Society, Los Alamos, CA, pp. 191-198
- Zhao, T., Nevatia, R. (2004) Tracking multiple humans in complex situations. IEEE PAMI 26: pp. 1208-1221
- Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes
International Journal of Computer Vision
Volume 72, Issue 1 , pp 47-66
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- Industry Sectors