International Journal of Computer Vision

, Volume 96, Issue 3, pp 290–314 | Cite as

Location Discriminative Vocabulary Coding for Mobile Landmark Search

  • Rongrong Ji
  • Ling-Yu Duan
  • Jie Chen
  • Hongxun Yao
  • Junsong Yuan
  • Yong Rui
  • Wen Gao


With the popularization of mobile devices, recent years have witnessed an emerging potential for mobile landmark search. In this scenario, the user experience heavily depends on the efficiency of query transmission over a wireless link. As sending a query photo is time consuming, recent works have proposed to extract compact visual descriptors directly on the mobile end towards low bit rate transmission. Typically, these descriptors are extracted based solely on the visual content of a query, and the location cues from the mobile end are rarely exploited. In this paper, we present a Location Discriminative Vocabulary Coding (LDVC) scheme, which achieves extremely low bit rate query transmission, discriminative landmark description, as well as scalable descriptor delivery in a unified framework. Our first contribution is a compact and location discriminative visual landmark descriptor, which is offline learnt in two-step: First, we adopt spectral clustering to segment a city map into distinct geographical regions, where both visual and geographical similarities are fused to optimize the partition of city-scale geo-tagged photos. Second, we propose to learn LDVC in each region with two schemes: (1) a Ranking Sensitive PCA and (2) a Ranking Sensitive Vocabulary Boosting. Both schemes embed location cues to learn a compact descriptor, which minimizes the retrieval ranking loss by replacing the original high-dimensional signatures. Our second contribution is a location aware online vocabulary adaption: We store a single vocabulary in the mobile end, which is efficiently adapted for a region specific LDVC coding once a mobile device enters a given region. The learnt LDVC landmark descriptor is extremely compact (typically 10–50 bits with arithmetical coding) and performs superior over state-of-the-art descriptors. We implemented the framework in a real-world mobile landmark search prototype, which is validated in a million-scale landmark database covering typical areas e.g. Beijing, New York City, Lhasa, Singapore, and Florence.


Mobile landmark search Compact visual descriptor Vocabulary compression Two-way coding Descriptor adaption System applications 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: speed up robust features. In ECCV (pp. 450–459). Google Scholar
  2. Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Grzeszczuk, R., & Girod, B. (2009a). CHoG: Compressed histogram of gradients a low bit-rate feature descriptor. In CVPR (pp. 2504–2511). Google Scholar
  3. Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Singh, J., & Girod, B. (2009b). Transform coding of image feature descriptors. In VCIP. doi: 10.1117/12.805982. Google Scholar
  4. Chandrasekhar, V., Chen, D., Lin, A., Takacs, G., Tsai, S., Cheung, N., Reznik, Y., Grzeszczuk, R., & Girod, B. (2010). Comparison of local feature descriptors for mobile visual search. In ICIP (pp. 3885–3888). Google Scholar
  5. Chen, D., Tsai, S., & Chandrasekhar, V. (2009). Tree histogram coding for mobile image matching. In DCC (pp. 143–152). Google Scholar
  6. Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Vedantham, R., Grzeszczuk, R., & Girod, B. (2010). Inverted index compression for scalable image matching. In DCC (pp. 525–552). Google Scholar
  7. Crandall, D., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In WWW (pp. 761–770). CrossRefGoogle Scholar
  8. Cristani, M., Perina, A., Castellani, U., & Murino, V. (2008). Geolocated image analysis using latent representations. In CVPR (pp. 1–9). Google Scholar
  9. Eade, E.-D., & Drummond, T.-W. (2008). Unified loop closing and recovery for real time monocular SLAM. In BMVC Google Scholar
  10. Freund, Y., & Schapire, R. (1994). A decision-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory (Vol. 904, pp. 23–37). Google Scholar
  11. Hays, J., & Efros, A. (2008). IMG2GPS: estimating geographic information from a single image. In CVPR (pp. 1–8). Google Scholar
  12. Hua, G., Brown, M., & Winder, S. (2007). Discriminant embedding for local image descriptors. In ICCV (pp. 1–8). Google Scholar
  13. Irschara, A., Zach, C., Frahm, J., & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR (pp. 2599–2606). Google Scholar
  14. Jegou, H., Douze, M., & Schmid, C. (2009). Packing bag-of-features. In ICCV (pp. 1–8). Google Scholar
  15. Jegou, H., Douze, M., Schmid, C., & Perez, P. (2010a). Aggregating local descriptors into a compact image representation. In CVPR (pp. 3304–3311). Google Scholar
  16. Jegou, H., Douze, M., & Schmid, C. (2010b). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99, 1. Google Scholar
  17. Ji, R., Xie, X., Yao, H., Ma, W.-Y., & Wu, Y. (2008). Vocabulary tree incremental indexing for scalable scene recognition. In ICME (pp. 869–872). Google Scholar
  18. Ji, R., Xie, X., Yao, H., & Ma, W.-Y. (2009a). Hierarchical optimization of visual vocabulary for effective and transferable retrieval. In CVPR (pp. 1161–1168). Google Scholar
  19. Ji, R., Xie, X., Yao, H., & Ma, W.-Y. (2009b). Mining city landmarks from blogs by graph modeling. In ACM Multimedia (pp. 105–114). Google Scholar
  20. Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., & Hertzmann, A. (2009). Image sequence geolocation with human travel priors. In CVPR (pp. 1–8). Google Scholar
  21. Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: A more distinctive representation for local image descriptors. In CVPR (pp. II-506–II-513). Google Scholar
  22. Kennedy, L., Naaman, M., Ahern, S., Nail, R., & Rattenbury, T. (2007). How Flickr helps us make sense of the world: context and content in community-contributed media collections. In ACM Multimedia (pp. 631–640). Google Scholar
  23. Lee, J.-A., Yow, K.-C., & Sluzek, A. (2008). Image-based information guide on mobile devices. In Advances in Visual Computing (pp. 346–355). CrossRefGoogle Scholar
  24. Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J.-M. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV (pp. 427–440). Google Scholar
  25. Liu, D., Scott, M., Ji, R., Yao, H., & Xie, X. (2009). Location sensitive indexing for image-based advertising. In ACM Multimedia (pp. 793–796). Google Scholar
  26. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  27. Makar, M., Chang, C., Chen, D., Tsai, S., & Girod, B. (2009). Compression of image patches for local feature extraction. In ICASSP (pp. 821–824). Google Scholar
  28. Mikolajczyk, K., & Schmid, C. (2005). Performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630. CrossRefGoogle Scholar
  29. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L. (2006). A comparison of affine region detectors. International Journal of Computer Vision, 29(11), 1735–1783. Google Scholar
  30. Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. In NIPS (pp. 849–856). Google Scholar
  31. Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR (pp. 2161–2168). Google Scholar
  32. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabulary and fast spatial matching. In CVPR (pp. 1–8). Google Scholar
  33. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management. Google Scholar
  34. Schindler, G., & Brown, M. (2007). City-scale location recognition. In CVPR (pp. 1–7). Google Scholar
  35. Shao, H., Svoboda, T., Tuytelaars, T., & Van Gool, L. (2003). Hpat indexing for fast object/scene recognition based on local appearance. In CIVR, (Vol. 2728, pp. 71–80). Google Scholar
  36. Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV (pp. 1470–1477). Google Scholar
  37. Tipping, M., & Bishop, C. (1997). Probabilistic principle component analysis. Technical Report, Neural Computing Research Group, Aston University. Google Scholar
  38. Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large databases for recognition. In CVPR (pp. 1–8). Google Scholar
  39. Tsai, S., Chen, D., Takacs, G., & Chandrasekhar, V. (2010). Location coding for mobile image retrieval. In MobileMedia Google Scholar
  40. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR (pp. 3360–3367). Google Scholar
  41. Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In NIPS (pp. 1753–1760). Google Scholar
  42. Witten, I., Moffat, A., & Bell, T. (1999). Managing gigabytes: compressing and indexing documents and images (2nd edn.). San Francisco: Morgan Kaufmann. Google Scholar
  43. Xiao, J.-X., Chen, J.-N., Yeung, D.-Y., & Quan, L. (2008). Structuring visual words in 3D for arbitrary-view object localization. In ECCV (pp. 725–737). Google Scholar
  44. Yeh, T., Lee, J., & Darell, T. (2007). Adaptive vocabulary forest for dynamic indexing and category learning. In CVPR (pp. 1–8). Google Scholar
  45. Yeo, C., Ahammad, P., & Ramchandran, K. (2008). Rate-efficient visual correspondences using random projections. In ICIP (pp. 217–220). Google Scholar
  46. Zhang, W., & Kosecka, J. (2006). Image based localization in urban environments. In 3DVT (pp. 33–40). Google Scholar
  47. Zheng, Y. T., Zhao, M., Song, Y., & Adam, H. (2009). Tour the world: building a web-scale landmark recognition engine. In CVPR (pp. 1085–1092). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Rongrong Ji
    • 1
    • 2
  • Ling-Yu Duan
    • 1
  • Jie Chen
    • 1
  • Hongxun Yao
    • 2
  • Junsong Yuan
    • 3
  • Yong Rui
    • 4
  • Wen Gao
    • 1
  1. 1.Institute of Digital MediaPeking UniversityBeijingChina
  2. 2.Visual Intelligence LaboratoryHarbin Institute of TechnologyHarbinChina
  3. 3.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore
  4. 4.Microsoft China Research and Development GroupBeijingChina

Personalised recommendations