Efficient Media Retrieval from Non-Cooperative Queries

  • Kevin ShihEmail author
  • Wei Di
  • Vignesh Jagadeesh
  • Robinson Piramuthu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9163)


Text is ubiquitous in the artificial world and easily attainable when it comes to book title and author names. Using the images from the book cover set from the Stanford Mobile Visual Search dataset and additional book covers and metadata from, we construct a large scale book cover retrieval dataset, complete with 100 K distractor covers and title and author strings for each.

Because our query images are poorly conditioned for clean text extraction, we propose a method for extracting a matching noisy and erroneous OCR readings and matching it against clean author and book title strings in a standard document look-up problem setup. Finally, we demonstrate how to use this text-matching as a feature in conjunction with popular retrieval features such as VLAD using a simple learning setup to achieve significant improvements in retrieval accuracy over that of either VLAD or the text alone.


Large scale Media retrieval Text 


  1. 1.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  2. 2.
    Chandrasekhar, V.R., Chen, D.M., Tsai, S.S., Cheung, N.M., Chen, H., Takacs, G., Reznik, Y., Vedantham, R., Grzeszczuk, R., Bach, J., et al.: The stanford mobile visual search data set. In: Proceedings of the Second Annual ACM Conference on Multimedia Systems, pp. 117–122. ACM (2011)Google Scholar
  3. 3.
    Chen, D.M., Tsai, S.S., Girod, B., Hsu, C.H., Kim, K.H., Singh, J.P.: Building book inventories using smartphones. In: Proceedings of the International Conference on Multimedia, pp. 651–654. ACM (2010)Google Scholar
  4. 4.
    Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 467–471. IEEE (2013)Google Scholar
  5. 5.
    Hariharan, B., Malik, J., Ramanan, D.: Discriminative Decorrelation for Clustering and Classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  7. 7.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE (2010)Google Scholar
  8. 8.
    Joachims, T.: Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 217–226. ACM, New York (2006).
  9. 9.
    Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 89–96. IEEE (2011)Google Scholar
  10. 10.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008) zbMATHCrossRefGoogle Scholar
  11. 11.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)Google Scholar
  12. 12.
    Matsushita, K., Iwai, D., Sato, K.: Interactive bookshelf surface for in situ book searching and storing support. In: Proceedings of the 2nd Augmented Human International Conference, p. 2. ACM (2011)Google Scholar
  13. 13.
    Navarro, G., Baeza-yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24, 2001 (2000)Google Scholar
  14. 14.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8. IEEE (2007)Google Scholar
  15. 15.
    Shahab, A., Shafait, F., Dengel, A.: Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 1491–1496. IEEE (2011)Google Scholar
  16. 16.
    Shao, H., Svoboda, T., Van Gool, L.: Zubud-zurich buildings database for image based recognition. Computer Vision Lab, Swiss Federal Institute of Technology, Switzerland, Technical report 260 (2003)Google Scholar
  17. 17.
    Smith, R.: An overview of the tesseract ocr engine. ICDAR. 7, 629–633 (2007)Google Scholar
  18. 18.
    Tsai, S.S., Chen, D., Chen, H., Hsu, C.H., Kim, K.H., Singh, J.P., Girod, B.: Combining image and text features: a hybrid approach to mobile book spine recognition. In: Proceedings of the 19th ACM International Conference on Multimedia, MM 2011, pp. 1029–1032. ACM, New York (2011).

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Kevin Shih
    • 1
    Email author
  • Wei Di
    • 2
  • Vignesh Jagadeesh
    • 2
  • Robinson Piramuthu
    • 2
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.EBay Research LabsSan JoseUSA

Personalised recommendations