Optimizing visual dictionaries for effective image retrieval

  • K. S. Arun
  • V. K. Govindan
Regular Paper


Characterizing images by high-level concepts from a learned visual dictionary is extensively used in image classification and retrieval. This paper deals with inferring discriminative visual dictionaries for effective image retrieval and examines a non-negative visual dictionary learning scheme towards this direction. More specifically, a non-negative matrix factorization framework with \(\ell _0\)-sparseness constraint on the coefficient matrix for optimizing the dictionary is proposed. It is a two-step iterative process composed of sparse encoding and dictionary enhancement stages. An initial estimate of the visual dictionary is updated in each iteration with the proposed \(\ell _0\)-constraint gradient projection algorithm. A desirable attribute of this formulation is an adaptive sequential dictionary initialization procedure. This leads to a sharp drop down of the approximation error and a faster convergence. Finally, the proposed dictionary optimization scheme is used to derive a compact image representation for the retrieval task. A new image signature is obtained by projecting local descriptors on to the basis elements of the optimized visual dictionary and then aggregating the resulting sparse encodings in to a single feature vector. Experimental results on various benchmark datasets show that the proposed system can infer enhanced visual dictionaries and the derived image feature vector can achieve better retrieval results as compared to state-of-the-art techniques.


Visual dictionary Image retrieval  Sparse coding  Matrix factorization 


  1. 1.
    Rebollo-Neira L (2004) Dictionary redundancy elimination. IEE Proc Vis Image Signal Process 151(1):31–34CrossRefGoogle Scholar
  2. 2.
    Lewicki M, Sejnowski T (2000) Learning overcomplete representations. Neural Comput 12(2):337–365CrossRefGoogle Scholar
  3. 3.
    Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401:788–791CrossRefGoogle Scholar
  4. 4.
    Berry M, Browne M, Langville A, Pauca P, Plemmons R (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52:55–173MathSciNetCrossRefGoogle Scholar
  5. 5.
    Spratling MW (2006) Learning image components for object recognition. J Mach Learn Res 7:793–815MathSciNetzbMATHGoogle Scholar
  6. 6.
    Xinhui H, Ryosuke I, Hisashi K Satoshi N (2010) Clustered-based language model for spoken document retrieval using NMF-based document clustering. In: Interspeech proceeding, pp 705–708Google Scholar
  7. 7.
    Dhillon IS, Modha DM (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175CrossRefzbMATHGoogle Scholar
  8. 8.
    Cadzow JA (2002) Minimum \(\ell _1\), \(\ell _2\) and \(\ell _{\infty }\) norm approximate solutions to an overdetermined system of linear equations. Digit Signal Process 12(4):524–560CrossRefGoogle Scholar
  9. 9.
    Aharon M, Elad M, Bruckstein A (2005) K-SVD and its non-negative variant for dictionary design. In: Proceedings of the SPIE conference on curvelet, directional, and sparse representations, vol 5914, pp 11.1–11.13Google Scholar
  10. 10.
    Peharz R, Pernkopf F (2012) Sparse nonnegative matrix factorization with \(\ell ^0\)-constraints. Neurocomput Spec Issue Mach Learn Signal Process 80(1):38–46Google Scholar
  11. 11.
    Bevilacqua M, Roumy A, Guillemot C, Morel MLA (2013) K-WEB: nonnegative dictionary learning for sparse image representations. In: Proceedings of the IEEE international conference on image processingGoogle Scholar
  12. 12.
    Shneier M, Abdel-Mottaleb M (1996) Exploiting the JPEG compression scheme for image retrieval. IEEE Trans Pattern Anal Mach Intell 18(8):849–853CrossRefGoogle Scholar
  13. 13.
    Jacobs CE, Finkelstein A, Salesin DH (1995) Fast multi resolution image querying. In: Proceedings of the 22nd ACM annual conference on computer graphics and interactive techniques, pp 277–286Google Scholar
  14. 14.
    Zhou W, Sei-ichiro K (2013) Face recognition with learned local curvelet patterns and 2-directional l1-norm based 2DPCA. In: Proceedings of the 10th Asian conference on computer visionGoogle Scholar
  15. 15.
    Mallat S, Pennec EL (2005) Bandelet image approximation and compression. SIAM Multiscale Model Simul 4(3):992–1039CrossRefzbMATHGoogle Scholar
  16. 16.
    Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60MathSciNetzbMATHGoogle Scholar
  17. 17.
    Lu G, Teng S (1999) A novel image retrieval technique based on vector quantization. In: Proceedings of the international conference on computational intelligence for modelling, control and automation, pp 36–41Google Scholar
  18. 18.
    Belhumeur PN, Hespanha JP, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720CrossRefGoogle Scholar
  19. 19.
    Bartlett MS, Movellan JR, Sejnowski TJ (2002) Face recognition by independent component analysis. IEEE Trans Neural Netw 13(6):1450–1464CrossRefGoogle Scholar
  20. 20.
    Wang N, Jingdong W, Yeung DY (2013) Online robust non-negative dictionary learning for visual tracking. In: Proceedings of IEEE international conference on computer vision, pp 657–664Google Scholar
  21. 21.
    Ross DA, Zemel RS (2006) Learning parts-based representations of data. J Mach Learn Res 7:2369–2397MathSciNetzbMATHGoogle Scholar
  22. 22.
    Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRefGoogle Scholar
  23. 23.
    Lee H, Battle A, Raina R, Ng AY (2006) Efficient sparse coding algorithms. In: Advances in neural information processing systems, pp 801–808Google Scholar
  24. 24.
    Olshausen BA, Field DJ (1997) Sparse coding with an over complete basis set: a strategy employed by V1? Vis Res 37(23):3311–3325CrossRefGoogle Scholar
  25. 25.
    Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469MathSciNetzbMATHGoogle Scholar
  26. 26.
    Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of advances in neural information processing systems, pp 556–562Google Scholar
  27. 27.
    Kim H, Park H (2008) Non negative matrix factorization based on alternating non negativity constrained least squares and active set method. SIAM J Matrix Anal Appl 30(2):713–730MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Lin CJ (2007) Projected gradient methods for non negative matrix factorization. Neural Comput 19(10):2756–2779MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Mallat S, Zhang Z (1993) Matching pursuits with time–frequency dictionaries. IEEE Trans Signal Process 41:3397–3415CrossRefzbMATHGoogle Scholar
  30. 30.
    Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of the twenty-seventh IEEE conference on signals, systems and computers, pp 40–44Google Scholar
  31. 31.
    Chen S, Donoho D, Saunders M (1998) Automatic decomposition by basis pursuit. SIAM J Sci Comput 1(3):33–61MathSciNetCrossRefGoogle Scholar
  32. 32.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288MathSciNetzbMATHGoogle Scholar
  33. 33.
    Gorodnitsky IF, Rao BD (1997) Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm. IEEE Trans Signal Process 45(3):600–616CrossRefGoogle Scholar
  34. 34.
    Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing over complete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322CrossRefGoogle Scholar
  35. 35.
    Patrik OH (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469zbMATHGoogle Scholar
  36. 36.
    Nakayama H, Harada T, Kuniyoshi Y (2010) Dense sampling low-level statistics of local features. IEICE Trans Inf Syst 93(7):1727–1736CrossRefGoogle Scholar
  37. 37.
    Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of the European conference on computer vision, pp 490–503Google Scholar
  38. 38.
    Langville AN, Meyer CD, Albright R, Cox J, Duling D (2006) Initializations for the non negative matrix factorization. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, pp 23–26Google Scholar
  39. 39.
    Rezaei M, Boostani R, Rezaei M (2011) An efficient initialization method for non negative matrix factorization. J Appl Sci 11(2):354–359CrossRefGoogle Scholar
  40. 40.
    Jafari MG, Plumbley MD (2011) Fast dictionary learning for sparse representations of speech signals. J Sel Top Signal Process 5(5):1025–1031CrossRefGoogle Scholar
  41. 41.
    Tropp J (2004) Greed is good: algorithmic results for sparse approximation. IEEE Trans Inf Theory 50(10):2231–2242MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, LondonGoogle Scholar
  43. 43.
    Vartak MN (1955) On an application of Kronecker product of matrices to statistical designs. Ann Math Stat 26(3):420–438Google Scholar
  44. 44.
    Armijo L (1966) Minimization of functions having Lipschitz continuous first partial derivatives. Pac J Math 16(1):1–3MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8Google Scholar
  46. 46.
    Zhao Y, Hong R, Jiang J, Wen J, Zhang H (2013) Image matching by fast random sample consensus. In: Proceedings of the fifth international conference on internet multimedia computing and service, pp 159–162Google Scholar
  47. 47.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the international conference on computer vision and pattern recognition, vol 2, pp 2169–2178Google Scholar
  48. 48.
    Zhang Y, Jia Z, Chen T (2011) Image retrieval with geometry-preserving visual phrases. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 809–816Google Scholar
  49. 49.
    Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: Proceedings on computer vision and pattern recognition, pp 1–8Google Scholar
  50. 50.
    Jgou H, Douze M, Schmid C, Prez P (2010) Aggregating local descriptors into a compact image representation. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR), pp 3304–3311Google Scholar
  51. 51.
    Perronnin F, Liu Y, Snchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 3384–3391Google Scholar
  52. 52.
    Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the 22nd british machine vision conference (BMVC), pp 76.1–76.12Google Scholar
  53. 53.
    Tamura H, Mori S, Yamawaki T (1978) Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern 8:460–472CrossRefGoogle Scholar
  54. 54.
    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of ninth IEEE international conference on computer vision, pp 1470–1477Google Scholar
  55. 55.
    Herve J, Matthijs D, Cordelia S (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision 2008 (ECCV 2008). Springer, Berlin, pp 304–317Google Scholar
  56. 56.
  57. 57.
    Lindeberg T (1998) Feature detection with automatic scale selection. Int J Comput Vis 30(2):79–116CrossRefGoogle Scholar
  58. 58.
    Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86CrossRefGoogle Scholar
  59. 59.
    Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  60. 60.
    Tola E, Lepetit V, Fua P (2010) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830CrossRefGoogle Scholar
  61. 61.
    Bouachir W, Kardouchi M, Belacel N (2009) Improving bag of visual words image retrieval: a fuzzy weighting scheme for efficient indexation. In: Proceedings of fifth IEEE international conference on signal-image technology & internet-based systems (SITIS), pp 215–220Google Scholar
  62. 62.
    Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In BMVC, vol 810, pp 812–815Google Scholar
  63. 63.
    Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2, pp II-506Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of TechnologyCalicutIndia

Personalised recommendations