A New Class of Learnable Detectors for Categorisation

  • Jiri Matas
  • Karel Zimmermann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3540)


A new class of image-level detectors that can be adapted by machine learning techniques to detect parts of objects from a given category is proposed. A classifier (e.g. neural network or adaboost trained classifier) within the detector selects a relevant subset of extremal regions, i.e. regions that are connected components of a thresholded image. Properties of extremal regions render the detector very robust to illumination change. Robustness to viewpoint change is achieved by using invariant descriptors and/or by modeling shape variations by the classifier.

The approach is brought to bear on three problems: text detection, face segmentation and leopard skin detection. High detection rates were obtained for unconstrained (i.e. brightness, affine and font invariant) text detection (92%) with a reasonable false positive rate.

The time-complexity of the detection is approximately linear in the number of pixels and a non-optimized implementation runs at about 1 frame per second for a 640× 480 image on a high-end PC.


False Negative Rate Face Detection Licence Plate Text Detection Viewpoint Change 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Fei-Fei, L., Fergus, R., Perona, P.: A bayesian approach to unsupervised one-shot learning of object categories. In: ICCV 2003, pp. 1134–1141 (2003)Google Scholar
  2. 2.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR 2003, vol. II, pp. 264–271 (2003)Google Scholar
  3. 3.
    Ferrari, V., Tuytelaars, T., Van Gool, L.: Real-time affine region tracking and coplanar grouping. In: CVPR 2001, II, 226–233 (2001)Google Scholar
  4. 4.
    Ferrari, V., Tuytelaars, T., Van Gool, L.: Wide-baseline multiple-view correspondences. In: CVPR (2003)Google Scholar
  5. 5.
    Kadir, T., Brady, M.: Saliency, scale and image description. In: IJCV 2001, vol. 45(2), pp. 83–105 (2001)Google Scholar
  6. 6.
    Lazebnik, S., Schmid, C., Ponce, J.: Affine-invariant local descriptors and neighborhood statistics for texture recognition. In: ICCV 2003, pp. 649–655 (2003)Google Scholar
  7. 7.
    Leibe, B., Schiele, B.: Analyzing appearance and contour based methods for object categorization. In: CVPR 2003, vol. II, pp. 409–415 (2003)Google Scholar
  8. 8.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV 1999, pp. 1150–1157 (1999)Google Scholar
  9. 9.
    Lucas, S.: Icdar03 text detection competition datasets (2003),
  10. 10.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC 2002, London, UK, vol. 1, pp. 384–393 (2002)Google Scholar
  11. 11.
    Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: ICCV 2001, pp. 525–531 (2001)Google Scholar
  12. 12.
    Mori, G., Belongie, S., Malik, J.: Shape contexts enable efficient retrieval of similar shapes. In: CVPR 2001, vol. I, pp. 723–730 (2001)Google Scholar
  13. 13.
    Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: BMVC, London, UK, vol. 1, pp. 113–122 (2002)Google Scholar
  14. 14.
    Pritchett, P., Zisserman, A.: Matching and reconstruction from widely separated views. In: Koch, R., Van Gool, L. (eds.) SMILE 1998. LNCS, vol. 1506, p. 78. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  15. 15.
    Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. PAMI 19(5), 530–535 (1997)Google Scholar
  16. 16.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV 2003, pp. 1470–1477 (2003)Google Scholar
  17. 17.
    Tuytelaars, T., van Gool, L.: Content-based image retrieval based on local affinely invariant regions. In: VIIS, pp. 493–500 (1999)Google Scholar
  18. 18.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jiri Matas
    • 1
  • Karel Zimmermann
    • 1
  1. 1.Center for Machine Perception, Faculty of Electrotechnical EngineeringCzech Technical University in Prague 

Personalised recommendations