Approximate Gaussian Mixtures for Large Scale Vocabularies

  • Yannis Avrithis
  • Yannis Kalantidis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7574)


We introduce a clustering method that combines the flexibility of Gaussian mixtures with the scaling properties needed to construct visual vocabularies for image retrieval. It is a variant of expectation-maximization that can converge rapidly while dynamically estimating the number of components. We employ approximate nearest neighbor search to speed-up the E-step and exploit its iterative nature to make search incremental, boosting both speed and precision. We achieve superior performance in large scale retrieval, being as fast as the best known approximate k-means.


Gaussian mixtures expectation-maximization visual vocabularies large scale clustering approximate nearest neighbor search 


  1. 1.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  2. 2.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)Google Scholar
  3. 3.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  4. 4.
    Perronnin, F.: Universal and adapted vocabularies for generic visual categorization. PAMI 30(7), 1243–1256 (2008)CrossRefGoogle Scholar
  5. 5.
    Li, D., Yang, L., Hua, X.S., Zhang, H.J.: Large-scale robust visual codebook construction. ACM Multimedia (2010)Google Scholar
  6. 6.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR (2008)Google Scholar
  7. 7.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: ICCV (2005)Google Scholar
  8. 8.
    Leibe, B., Mikolajczyk, K., Schiele, B.: Efficient clustering and matching for object class recognition. In: BMVC (2006)Google Scholar
  9. 9.
    Fulkerson, B., Vedaldi, A., Soatto, S.: Localizing Objects with Smart Dictionaries. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 179–192. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: ICCV (2005)Google Scholar
  11. 11.
    Wu, J., Rehg, J.M.: Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In: ICCV (2009)Google Scholar
  12. 12.
    Agarwal, A., Triggs, B.: Hyperfeatures – Multilevel Local Coding for Visual Recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 30–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Tuytelaars, T., Schmid, C.: Vector quantizing feature space with a regular lattice. In: ICCV (October 2007)Google Scholar
  14. 14.
    Dong, W., Wang, Z., Charikar, M., Li, K.: Efficiently matching sets of features with random histograms. ACM Multimedia (2008)Google Scholar
  15. 15.
    Philbin, J., Chum, O., Sivic, J., Isard, M., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: CVPR (2008)Google Scholar
  16. 16.
    Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: CVPR (2008)Google Scholar
  17. 17.
    Muja, M., Lowe, D.: Fast approximate nearest neighbors with automatic algorithm configuration. In: ICCV (2009)Google Scholar
  18. 18.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. PAMI 33(1), 117–128 (2011)CrossRefGoogle Scholar
  19. 19.
    van Gemert, J., Veenman, C., Smeulders, A., Geusebroek, J.: Visual word ambiguity. PAMI 32(7), 1271–1283 (2010)CrossRefGoogle Scholar
  20. 20.
    Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)Google Scholar
  21. 21.
    Jegou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. IJCV 87(3), 316–336 (2010)CrossRefGoogle Scholar
  22. 22.
    Lehmann, A., Leibe, B., van Gool, L.: PRISM: Principled implicit shape model. In: BMVC (2009)Google Scholar
  23. 23.
    Mikulík, A., Perdoch, M., Chum, O., Matas, J.: Learning a Fine Vocabulary. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 1–14. Springer, Heidelberg (2010)Google Scholar
  24. 24.
    Jegou, H., Douze, M., Schmid, C.: Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  25. 25.
    Ueda, N., Nakano, R., Ghahramani, Z., Hinton, G.: SMEM algorithm for mixture models. Neural Computation 12(9), 2109–2128 (2000)CrossRefGoogle Scholar
  26. 26.
    Figueiredo, M., Jain, A.: Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3), 381–396 (2002)CrossRefGoogle Scholar
  27. 27.
    Verbeek, J., Nunnink, J., Vlassis, N.: Accelerated EM-based clustering of large data sets. Data Mining and Knowledge Discovery 13(3), 291–307 (2006)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2009)Google Scholar
  29. 29.
    Tolias, G., Avrithis, Y.: Speeded-up, relaxed spatial matching. In: ICCV (2011)Google Scholar
  30. 30.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded Up Robust Features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yannis Avrithis
    • 1
  • Yannis Kalantidis
    • 1
  1. 1.National Technical University of AthensGreece

Personalised recommendations