International Journal of Computer Vision

, Volume 38, Issue 1, pp 15–33 | Cite as

A Trainable System for Object Detection

  • Constantine Papageorgiou
  • Tomaso Poggio


This paper presents a general, trainable system for object detection in unconstrained, cluttered scenes. The system derives much of its power from a representation that describes an object class in terms of an overcomplete dictionary of local, oriented, multiscale intensity differences between adjacent regions, efficiently computable as a Haar wavelet transform. This example-based learning approach implicitly derives a model of an object class by training a support vector machine classifier using a large set of positive and negative examples. We present results on face, people, and car detection tasks using the same architecture. In addition, we quantify how the representation affects detection performance by considering several alternate representations including pixels and principal components. We also describe a real-time application of our person detection system as part of a driver assistance system.

computer vision machine learning pattern recognition people detection face detection ear detection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Betke, M., Haritaoglu, E., and Davis, L. 1997. Highway scene analysis in hard real-time. In Proceedings of Intelligent Transportation Systems.Google Scholar
  2. Betke, M. and Nguyen, H. 1998. Highway scene analysis form a moving vehicle under reduced visibility conditions. In Proceedings of Intelligent Vehicles, pp. 131–136.Google Scholar
  3. Beymer, D., McLauchlan, P., Coifman, B., and Malik, J. 1997. A real-time computer vision system for measuring traffic parameters. In Proceedings of Computer Vision and Pattern Recognition, pp. 495–501.Google Scholar
  4. Bregler, C. and Malik, J. 1996. Learning appearance based models: Mixtures of second moment experts. In Advances in Neural Information Processing Systems.Google Scholar
  5. Burges, C. 1996. Simplified support vector decision rules. In Proceedings of 13th International Conference on Machine Learning.Google Scholar
  6. Burges, C. 1998. A tutorial on support vector machines for pattern recognition. In Proceedings of Data Mining and Knowledge Discovery, U. Fayyad (Ed.), pp. 1–43.Google Scholar
  7. Forsyth, D. and Fleck, M. 1997. Body plans. In Proceedings of Computer Vision and Pattern Recognition, pp. 678–683.Google Scholar
  8. Forsyth, D. and Fleck, M. 1999. Automatic detection of human nudes, International Journal of Computer Vision, 32(1):63–77.Google Scholar
  9. Franke, U., Gavrila, D., Goerzig, S., Lindner, F., Paetzold, F., and Woehler, C. 1998. Autonomous driving goes downtown. IEEE Intelligent Systems, pp. 32–40.Google Scholar
  10. Haritaoglu, I., Harwood, D., and Davis, L. 1998. W4: Who? When? Where? What? A real time system for detecting and tracking people. In Face and Gesture Recognition, pp. 222–227.Google Scholar
  11. Heisele, B. and Wohler, C. 1998. Motion-based recognition of pedestrians. In Proceedings of International Conference on Pattern Recognition, pp. 1325–1330.Google Scholar
  12. Hogg, D. 1983. Model-based vision: A program to see a walking person. Image and Vision Computing, 1(1):5–20.Google Scholar
  13. Itti, L. and Koch, C. 1999. A comparison of feature combination strategies for saliency-based visual attention systems. In Human Vision and Electronic Imaging, vol. 3644, pp. 473–482.Google Scholar
  14. Itti, L., Koch, C., and Niebur, E. 1998. A model of saliencybased visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–1259.Google Scholar
  15. Joachims, T. 1997. Text categorization with support vector machines. Technical Report LS-8 Report 23, University of Dortmund.Google Scholar
  16. Lipson, P. 1996. Context and configuration based scene classification. Ph.D. thesis, Massachusetts Institute of Technology.Google Scholar
  17. Lipson, P., Grimson, W., and Sinha, P. 1997. Configuration based scene classification and image indexing. In Proceedings of Computer Vision and Pattern Recognition, pp. 1007–1013.Google Scholar
  18. Mallat, S. 1989. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674–693.Google Scholar
  19. McKenna, S. and Gong, S. 1997. Non-intrusive person authentication for access control by visual tracking and face recognition. In Audio-and Video-based Biometric Person Authentication, J. Bigun, G. Chollet, and G. Borgefors (Eds.), pp. 177–183.Google Scholar
  20. Moghaddam, B. and Pentland, A. 1995. Probabilistic visual learning for object detection. In Proceedings of 6th International Conference on Computer Vision.Google Scholar
  21. Mohan, A. 1999. Robust object detection in images by components. Master's Thesis, Massachusetts Institute of Technology.Google Scholar
  22. Osuna, E., Freund, R., and Girosi, F. 1997a. Support vector machines: Training and applications. A.I. Memo 1602, MIT Artificial Intelligence Laboratory.Google Scholar
  23. Osuna, E., Freund, R., and Girosi, F. 1997b. Training support vector machines: An application to face detection. In Proceedings of Computer Vision and Pattern Recognition, pp. 130–136.Google Scholar
  24. Rohr, K. 1993. Incremental recognition of pedestrians from image sequences. In Proceedings of Computer Vision and Pattern Recognition, pp. 8–13.Google Scholar
  25. Rowley, H., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–38.Google Scholar
  26. Shio, A. and Sklansky, J. 1991. Segmentation of people in motion. In IEEE Workshop on Visual Motion, pp. 325–332.Google Scholar
  27. Sinha, P. 1994. Qualitative image-based representations for object recognition. A.I. Memo 1505, MIT Artificial Intelligence Laboratory.Google Scholar
  28. Stollnitz, E., DeRose,T., and Salesin, D. 1994. Wavelets for computer graphics: A primer. Technical Report 94-09-11, Department of Computer Science and Engineering, University of Washington.Google Scholar
  29. Sung, K.-K. 1995. Learning and example selection for object and pattern detection. Ph.D. Thesis, MIT Artificial Intelligence Laboratory.Google Scholar
  30. Sung, K.-K. and Poggio, T. 1994. Example-based learning for viewbased human face detection. A.I. Memo 1521, MIT Artificial Intelligence Laboratory.Google Scholar
  31. Vaillant, R., Monrocq, C., and Cun, Y.L. 1994. Original approach for the localisation of objects in images. IEE Proceedings Vision Image Signal Processing, 141(4):245–250.Google Scholar
  32. Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer Verlag.Google Scholar
  33. Vapnik, V. 1998. Statistical Learning Theory. John Wiley and Sons: New York.Google Scholar
  34. Wren, C., Azarbayejani, A., Darrell, T., and Pentland, A. 1995. Pfinder: Real-time tracking of the human body. Technical Report 353, MIT Media Laboratory.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Constantine Papageorgiou
    • 1
  • Tomaso Poggio
    • 1
  1. 1.Center for Biological and Computational Learning, Artificial Intelligence LaboratoryMITCambridgeUSA

Personalised recommendations