VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval

  • Ken Chatfield
  • Andrew Zisserman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7725)


This paper addresses the problem of object category retrieval in large unannotated image datasets. Our aim is to enable both fast learning of an object category model, and fast retrieval over the dataset. With these elements we show that new visual concepts can be learnt on-the-fly, given a text description, and so images of that category can then be retrieved from the dataset in realtime.

To this end we compare state of the art encoding methods and introduce a novel cascade retrieval architecture, with a focus on achieving the best trade-off between three important performance measures for a realtime system of this kind, namely: (i) class accuracy, (ii) memory footprint, and (iii) speed.

We show that an on-the-fly system is possible and compare its performance (using noisy training images) to that of using carefully curated images. For this evaluation we use the VOC 2007 dataset together with 100k images from ImageNet to act as distractors.


Training Image Object Category Image Descriptor Sift Feature Negative Training 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proc. CVPR (2007)Google Scholar
  2. 2.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proc. CVPR (2006)Google Scholar
  3. 3.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. ICCV., vol. 2, pp. 1470–1477 (2003)Google Scholar
  4. 4.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proc. CVPR (2010)Google Scholar
  5. 5.
    Perronnin, F., Sanchez, J., Liu, Y.: Large-scale image categorization with explicit data embedding. In: Proc. CVPR (2010)Google Scholar
  6. 6.
    Perronnin, F., Akata, Z., Harchaoui, Z., Schmid, C.: Towards good practice in large-scale learning for image classification. In: Proc. CVPR (2012)Google Scholar
  7. 7.
    Sánchez, J., Perronnin, F.: High-dimensional signature compression for large-scale image classification. In: Proc. CVPR (2011)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proc. CVPR (2009)Google Scholar
  9. 9.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE PAMI (2011)Google Scholar
  10. 10.
    Parkhi, O.M., Vedaldi, A., Zisserman, A.: On-the-fly specific person retrieval. In: Intl. Workshop on Image Analysis for Multimedia Interactive Services. IEEE (2012)Google Scholar
  11. 11.
    Berg, T.L., Forsyth, D.A.: Animals on the web. In: Proc. CVPR (2006)Google Scholar
  12. 12.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. In: Proc. ICCV (2005)Google Scholar
  13. 13.
    Li, J., Wang, G., Fei-Fei, L.: OPTIMOL: automatic Object Picture collecTion via Incremental MOdel Learning. In: Proc. CVPR (2007)Google Scholar
  14. 14.
    Lin, W.H., Jin, R., Hauptmann, A.: Web Image Retrieval Re-Ranking with Relevance Model. In: Proc. ICWI (2003)Google Scholar
  15. 15.
    Schroff, F., Criminisi, A., Zisserman, A.: Harvesting Image Databases from the Web. IEEE PAMI 33, 754–766 (2011)CrossRefGoogle Scholar
  16. 16.
    Jégou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. IJCV 87, 316–336 (2010)CrossRefGoogle Scholar
  17. 17.
    Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proc. BMVC (2011)Google Scholar
  18. 18.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS, pp. 487–493. MIT Press (1998)Google Scholar
  19. 19.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proc. CVPR (2007)Google Scholar
  20. 20.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for Large-Scale Image Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Lowe, D.: Object recognition from local scale-invariant features. In: Proc. ICCV, pp. 1150–1157 (1999)Google Scholar
  22. 22.
    Bosch, A., Zisserman, A., Munoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE PAMI 30 (2008)Google Scholar
  23. 23.
    Bergamo, A., Torresani, L., Fitzgibbon, A.: PiCoDes: Learning a compact code for novel-category recognition. In: NIPS, pp. 2088–2096 (2011)Google Scholar
  24. 24.
    Torresani, L., Szummer, M., Fitzgibbon, A.: Efficient Object Category Recognition Using Classemes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 776–789. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  25. 25.
    Grauman, K., Darrel, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: Proc. ICCV (2005)Google Scholar
  26. 26.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: Proc. CVPR (2006)Google Scholar
  27. 27.
    Jégou, H., Perronnin, F., Douze, M., Sánchez, J., P’erez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE PAMI (2011)Google Scholar
  28. 28.
    Jegou, H., Douze, M., Schmid, C.: Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  29. 29.
    Rastegari, M., Fang, C., Torresani, L.: Scalable object-class retrieval with approximate and top-k ranking. In: Proc. ICCV (2011)Google Scholar
  30. 30.
    Goto, K., Kidono, K., Kimura, Y., Naito, T.: Pedestrian detection and direction estimation by cascade detector with multi-classifiers utilizing feature interaction descriptor. In: Proc. IEEE Symposium on Intelligent Vehicles, pp. 224–229 (2011)Google Scholar
  31. 31.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. IJCV 88, 303–338 (2010)CrossRefGoogle Scholar
  32. 32.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008),
  33. 33.
    Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE PAMI (2011)Google Scholar
  34. 34.
    Bergamo, A., Fang, C., Torresani, L.: VLG extractor software (2011)Google Scholar
  35. 35.
    Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for SVM. In: Proc. ICML, pp. 807–814 (2007)Google Scholar
  36. 36.
    Arandjelović, R., Zisserman, A.: Multiple queries for large scale specific object retrieval. In: Proc. BMVC (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ken Chatfield
    • 1
  • Andrew Zisserman
    • 1
  1. 1.University of OxfordUnited Kingdom

Personalised recommendations