Efficient Histogram-Based Similarity Search in Ultra-High Dimensional Space

  • Jiajun Liu
  • Zi Huang
  • Heng Tao Shen
  • Xiaofang Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6588)


Recent development in image content analysis has shown that the dimensionality of an image feature can reach thousands or more for satisfactory results in some applications such as face recognition. Although high-dimensional indexing has been extensively studied in database literature, most existing methods are tested for feature spaces with less than hundreds of dimensions and their performance degrades quickly as dimensionality increases. Given the huge popularity of histogram features in representing image content, in this papers we propose a novel indexing structure for efficient histogram based similarity search in ultra-high dimensional space which is also sparse. Observing that all possible histogram values in a domain form a finite set of discrete states, we leverage the time and space efficiency of inverted file. Our new structure, named two-tier inverted file, indexes the data space in two levels, where the first level represents the list of occurring states for each individual dimension, and the second level represents the list of occurring images for each state. In the query process, candidates can be quickly identified with a simple weighted state-voting scheme before their actual distances to the query are computed. To further enrich the discriminative power of inverted file, an effective state expansion method is also introduced by taking neighbor dimensions’ information into consideration. Our extensive experimental results on real-life face datasets with 15,488 dimensional histogram features demonstrate the high accuracy and the great performance improvement of our proposal over existing methods.


Face Recognition Face Image Indexing Structure Query Image State Expansion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: Application to face recognition. IEEE TPAMI 28(12), 2037–2041 (2006)CrossRefzbMATHGoogle Scholar
  2. 2.
    An, J., Chen, H., Furuse, K., Ohbo, N.: Cva file: an index structure for high-dimensional datasets. Knowl. Inf. Syst. 7(3), 337–357 (2005)CrossRefGoogle Scholar
  3. 3.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. CACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  4. 4.
    Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)CrossRefGoogle Scholar
  5. 5.
    Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces. In: VLDB, pp. 89–100 (2000)Google Scholar
  6. 6.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)Google Scholar
  7. 7.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Symposium on Computational Geometry, pp. 253–262 (2004)Google Scholar
  8. 8.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2) (2008)Google Scholar
  9. 9.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)Google Scholar
  10. 10.
    Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: iDistance: An adaptive B\(^{\mbox{+}}\)-tree based indexing method for nearest neighbor search. ACM TODS 30(2), 364–397 (2005)CrossRefGoogle Scholar
  11. 11.
    Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. ACM TOMCCAP 2(1), 1–19 (2006)CrossRefGoogle Scholar
  12. 12.
    Lu, H., Ooi, B.C., Shen, H.T., Xue, X.: Hierarchical indexing structure for efficient similarity search in video retrieval. IEEE TKDE 18(11), 1544–1559 (2006)Google Scholar
  13. 13.
    Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation. In: VLDB, pp. 516–526 (2000)Google Scholar
  14. 14.
    Shen, H.T., Ooi, B.C., Zhou, X., Huang, Z.: Towards effective indexing for very large video sequence database. In: SIGMOD, pp. 730–741 (2005)Google Scholar
  15. 15.
    Shen, H.T., Zhou, X., Zhou, A.: An adaptive and dynamic dimensionality reduction method for high-dimensional indexing. VLDB Journal 16(2), 219–234 (2007)CrossRefGoogle Scholar
  16. 16.
    Swain, M.J., Ballard, D.H.: Color indexing. IJCV 7(1), 11–32 (1991)CrossRefGoogle Scholar
  17. 17.
    Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp. 563–576 (2009)Google Scholar
  18. 18.
    Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, pp. 194–205 (1998)Google Scholar
  19. 19.
    Zhang, B., Gao, Y., Zhao, S., Liu, J.: Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE TIP 19(2), 533–544 (2010)MathSciNetGoogle Scholar
  20. 20.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jiajun Liu
    • 1
  • Zi Huang
    • 1
    • 2
  • Heng Tao Shen
    • 1
  • Xiaofang Zhou
    • 1
    • 2
  1. 1.School of ITEEUniversity of QueenslandAustralia
  2. 2.Queensland Research LaboratoryNational ICTAustralia

Personalised recommendations