Improved embedding product quantization

  • The-Anh PhamEmail author
Original Paper


Real-time object matching and recognition is a challenging task in computer vision probably due to the extensively computational overload posed by large and high dimensional data space. Indexing approaches can help achieving thousands of times in speedups when comparing to sequential search. In this work, we propose a novel usage of product quantization and hierarchical clustering so that search speedups can be even improved further. To validate the proposed indexing algorithm, a number of experiments have been carried out on different datasets. Experimental results demonstrate that the proposed method works very efficiently and far outperforms many other state-of-the-art techniques.


Product quantization Hierarchical clustering tree Approximate nearest search 



This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.01-2016.01


  1. 1.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pp. 459–468 (2006)Google Scholar
  2. 2.
    Babenko, A., Lempitsky, V.: Additive quantization for extreme vector compression. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 931–938 (2014)Google Scholar
  3. 3.
    Babenko, A., Lempitsky, V.: The inverted multi-index. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1247–1260 (2015)CrossRefGoogle Scholar
  4. 4.
    Babenko, A., Lempitsky, V.: Tree quantization for large-scale similarity search and classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4240–4248 (2015)Google Scholar
  5. 5.
    Babenko, A., Lempitsky, V.: Product split trees. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6316–6324 (2017)Google Scholar
  6. 6.
    Dong, X., Shen, J., Shao, L.: Hierarchical superpixel-to-pixel dense matching. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2518–2526 (2017)CrossRefGoogle Scholar
  7. 7.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pp. 226–231. AAAI Press (1996)Google Scholar
  8. 8.
    Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)CrossRefzbMATHGoogle Scholar
  9. 9.
    Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 744–755 (2014)CrossRefGoogle Scholar
  10. 10.
    Gordo, A., Almazan, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) European Conference on Computer Vision. ECCV 2016. Lecture Notes in Computer Science, vol. 9910, pp. 241–257 (2016)Google Scholar
  11. 11.
    Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pp. 369–376 (2006)Google Scholar
  12. 12.
    Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Mag. 35(1), 84–100 (2018)CrossRefGoogle Scholar
  13. 13.
    He, J., Liu, W., Chang, S.F.: Scalable similarity search with optimized kernel hashing. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp. 1129–1138 (2010)Google Scholar
  14. 14.
    Hu, P., Ramanan, D.: Finding tiny faces. CoRR arXiv:1612.04402 (2016)
  15. 15.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, STOC’98, pp. 604–613 (1998)Google Scholar
  16. 16.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)CrossRefGoogle Scholar
  17. 17.
    Kalantidis, Y., Avrithis, Y.: Locally optimized product quantization for approximate nearest neighbor search. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2014), pp. 2329–2336. Columbus, Ohio (2014)Google Scholar
  18. 18.
    Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids (1987)Google Scholar
  19. 19.
    Klein, B., Wolf, L.: In defense of product quantization. CoRR arXiv:1711.08589 (2017)
  20. 20.
    Krizhevsky, A., Hinton, G.E.: Using very deep autoencoders for content-based image retrieval. In: Proceedings of the European Symposium on Artificial Neural Networks (ESANN) (2011)Google Scholar
  21. 21.
    Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: IEEE International Conference on Computer Vision, ICCV’09, pp. 2130–2137 (2009)Google Scholar
  22. 22.
    Li, L., Hu, Q., Han, Y., Li, X.: Distribution sensitive product quantization. IEEE Trans. Circuits Syst. Video Technol. (2017)
  23. 23.
    Liu, H., Wang, R., Shan, S., Chen, X.: Deep supervised hashing for fast image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 2064–2072 (2016)Google Scholar
  24. 24.
    Liu, L., Ouyang, W., Wang, X., Fieguth, P.W., Chen, J., Liu, X., Pietikäinen, M.: Deep learning for generic object detection: a survey. CoRR arXiv:1809.02165 (2018)
  25. 25.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Macqueen, J.B.: Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  27. 27.
    McNames, J.: A fast nearest-neighbor algorithm based on a principal axis search tree. IEEE Trans. Pattern Anal. Mach. Intell. 23(9), 964–976 (2001)CrossRefGoogle Scholar
  28. 28.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of International Conference on Computer Vision Theory and Applications, VISAPP’09, pp. 331–340 (2009)Google Scholar
  29. 29.
    Muja, M., Lowe, D.G.: Fast matching of binary features. In: Proceedings of the Ninth Conference on Computer and Robot Vision, CRV’12, pp. 404–410 (2012)Google Scholar
  30. 30.
    Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2227–2240 (2014)CrossRefGoogle Scholar
  31. 31.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Volume 2, CVPR’06, pp. 2161–2168 (2006)Google Scholar
  32. 32.
    Norouzi, M., Fleet, D.J.: Cartesian k-means. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’13, pp. 3017–3024 (2013)Google Scholar
  33. 33.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  34. 34.
    Panigrahy, R.: Entropy based nearest neighbor search in high dimensions. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA’06, pp. 1186–1195 (2006)Google Scholar
  35. 35.
    Pham, T.A.: Pair-wisely optimized clustering tree for feature indexing. Comput. Vis. Image Underst. 154(1), 35–47 (2017)CrossRefGoogle Scholar
  36. 36.
    Pham, T.A., Barrat, S., Delalandre, M., Ramel, J.Y.: An efficient indexing scheme based on linked-node m-ary tree structure. In: 17th International Conference on Image Analysis and Processing (ICIAP 2013), LNCS, vol. 8156, pp. 752–762 (2013)Google Scholar
  37. 37.
    Pham, T.A., Do, N.T.: Embedding hierarchical clustering in product quantization for feature indexing. Multimedia Tools and Applications. (2018)
  38. 38.
    Qian, X., Zhao, Y., Han, J.: Image location estimation by salient region matching. IEEE Trans. Image Process. 24(11), 4348–4358 (2015)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Qin, X., Shen, J., Mao, X., Li, X., Jia, Y.: Robust match fusion using optimization. IEEE Trans. Cybern. 45(8), 1549–1560 (2015)CrossRefGoogle Scholar
  40. 40.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. CoRR arXiv:1503.03832 (2015)
  41. 41.
    Shen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., Shao, L.: Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEE Trans. Image Process. 25(12), 5933–5942 (2016)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR’08, pp. 1–8 (2008)Google Scholar
  43. 43.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556 (2014)
  44. 44.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pp. 1470–1477 (2003)Google Scholar
  45. 45.
    Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM Multimedia (2014)Google Scholar
  46. 46.
    Wang, H., Cai, Y., Zhang, Y., Pan, H., Lv, W., Han, H.: Deep learning for image retrieval: what works and what doesn’t. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW) pp. 1576–1583 (2015)Google Scholar
  47. 47.
    Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2018)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2018)MathSciNetCrossRefGoogle Scholar
  49. 49.
    Wieschollek, P., Wang, O., Sorkine-Hornung, A., Lensch, H.P.: Efficient large-scale approximate nearest neighbor search on the gpu. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2027–2035 (2016)Google Scholar
  50. 50.
    Wu, C., Karanasou, P., Gales, M.J.F., Sim, K.C.: Stimulated deep neural network for speech recognition. In: Proceedings of InterSpeech, pp. 400–404 (2016)Google Scholar
  51. 51.
    Wu, X., He, R., Sun, Z.: A lightened CNN for deep face representation. CoRR arXiv:1511.02683 (2015)
  52. 52.
    Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Faceboxes: a CPU real-time face detector with high accuracy. CoRR arXiv:1708.05234 (2017)
  53. 53.
    Zhang, T., Du, C., Wang, J.: Composite quantization for approximate nearest neighbor search. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 838–846 (2014)Google Scholar
  54. 54.
    Zhang, T., Qi, G.J., Tang, J., Wang, J.: Sparse composite quantization. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR’15), pp. 4548–4556 (2015)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Hong Duc University (HDU)Thanh Hoa CityVietnam

Personalised recommendations