Abstract
Content-Based Image Retrieval based on local features is computationally expensive because of the complexity of both extraction and matching of local feature. On one hand, the cost for extracting, representing, and comparing local visual descriptors has been dramatically reduced by recently proposed binary local features. On the other hand, aggregation techniques provide a meaningful summarization of all the extracted feature of an image into a single descriptor, allowing us to speed up and scale up the image search. Only a few works have recently mixed together these two research directions, defining aggregation methods for binary local features, in order to leverage on the advantage of both approaches.In this paper, we report an extensive comparison among state-of-the-art aggregation methods applied to binary features. Then, we mathematically formalize the application of Fisher Kernels to Bernoulli Mixture Models. Finally, we investigate the combination of the aggregated binary features with the emerging Convolutional Neural Network (CNN) features. Our results show that aggregation methods on binary features are effective and represent a worthwhile alternative to the direct matching. Moreover, the combination of the CNN with the Fisher Vector (FV) built upon binary features allowed us to obtain a relative improvement over the CNN results that is in line with that recently obtained using the combination of the CNN with the FV built upon SIFTs. The advantage of using the FV built upon binary features is that the extraction process of binary features is about two order of magnitude faster than SIFTs.
Similar content being viewed by others
Notes
A Bernoulli distribution p(x) = μ x(1−μ)1−x of parameter μ can be written as exponential distribution p(x) = e x p(η x−l o g(1 + e η)) where \(\eta = \log \left (\frac {\mu }{1-\mu }\right )\) is the natural parameter. In [62] the score function is computed considering the gradient w.r.t. the natural parameters η while in this paper we used the gradient w.r.t. the standard parameter μ of the Bernoulli (as also done in [72]).
To search a database for the objects similar to a query we can use either a similarity function or a distance function. In the first case, we search for the objects with greatest similarity to the query. In the latter case, we search for the objects with lowest distance from the query. A similarity function is said to be equivalent to a distance function if the ranked list of the results to query is the same. For example, the Euclidean distance between two vectors (ℓ 2(x 1,x 2)=∥x 1−x 2∥2) is equivalent to the cosine similarity (s cos(x 1,x 2)=(x 1⋅x 2)/(∥x 1∥2∥x 2∥2)) whenever the vectors are L 2- normalized (i.e. ∥x 1∥2=∥x 2∥2=1). In fact, in such case, \(s_{\text {cos}}(x_{1},x_{2})=1-\frac {1}{2}{\ell _{2}(x_{1},x_{2})}^{2}\), which implies that the ranked list of the results to a query is the same (i.e., ℓ 2(x 1,x 2)≤ℓ 2(x 1,x 3) iff s cos(x 1,x 2)≥s cos(x 1,x 3) ∀ x 1,x 2,x 3).
An elementary matrix E(u,v,σ) = I−σ u v H is non-singular if and only if σ v H u ≠ 1 and in this case the inverse is E(u,v,σ)−1 = E(u,v,τ) where τ = σ/(σ v H u−1). More details on this topic can be found in [29].
References
Alcantarilla PF, Nuevo J, Bartoli A (2013) Fast explicit diffusion for accelerated features in nonlinear scale spaces British machine vision conference (BMVC)
Amato G, Falchi F, Gennaro C, Vadicamo L (2016) Deep Permutations: Deep Convolutional Neural Networks and Permutation-Based Indexing. Springer International Publishing, Cham, pp 93–106. doi:10.1007/978-3-319-46759-7_7
Amato G, Falchi F, Vadicamo L (2016) How effective are aggregation methods on binary features? Proceedings of the 11th joint conference on computer vision, imaging and computer graphics theory and applications, vol 4, pp 566–573
Amato G, Falchi F, Vadicamo L (2016) Visual Recognition of Ancient Inscriptions Using Convolutional Neural Network and Fisher Vector, J Comput Cult Herit (JOCCH) Article 21 9, 4 (December 2016) 24 pages. doi:10.1145/2964911
Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval 2012 IEEE conference on Computer vision and pattern recognition (CVPR), pp 2911–2918
Arandjelovic R, Zisserman A (2013) All about VLAD 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/CVPR.2013.207, pp 1578–1585
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval Computer Vision–ECCV 2014. doi:10.1007/978-3-319-10590-1_38. Springer, pp 584–599
Bay H, Tuytelaars T, Van Gool L (2006) Surf: Speeded up robust features. In: Leonardis A, Bischof H, Pinz A (eds) Computer Vision - ECCV 2006, Lecture Notes in Computer Science. doi:10.1007/11744023_32, vol 3951. Springer, Berlin, pp 404–417
Bing images. http://www.bing.com/images/
Bishop CM (2006) Pattern recognition and machine learning. Information science and statistics. Springer
Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition 2010 IEEE conference on Computer vision and pattern recognition (CVPR), pp 2559–2566
Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: Binary robust independent elementary features. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision - ECCV 2010, Lecture Notes in Computer Science, vol 6314. Springer, Berlin Heidelberg, pp 778–792
Chandrasekhar V, Lin J, Morère O, Goh H, Veillard A (2015) A practical guide to cnns and fisher vectors for image instance retrieval arXiv:1508.02496
Chen D, Tsai S, Chandrasekhar V, Takacs G, Chen H, Vedantham R, Grzeszczuk R, Girod B (2011) Residual enhanced visual vectors for on-device image matching 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR). doi:10.1016/j.sigpro.2012.06.005, pp 850–854
Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: Automatic query expansion with a generative feature model for object retrieval IEEE 11th international conference on Computer vision, 2007. ICCV 2007, pp 1–8
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. Workshop on statistical learning in computer vision. ECCV 1(1-22):1–2
Datta R, Li J, Wang JZ (2005) Content-based image retrieval: Approaches and trends of the new age Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval, MIR ’05. ACM, New York, pp 253–262
Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the VLAD image representation Proceedings of the 21st ACM International Conference on Multimedia, MM 2013. doi:10.1145/2502081.2502171. ACM, New York, pp 653–656
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. doi:10.1109/CVPR.2009.5206848, pp 248–255
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: A deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
Galvez-Lopez D., Tardos J. (2011) Real-time loop detection with bags of binary words IEEE/RSJ international conference on Intelligent robots and systems (IROS), 2011, pp 51–58
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. http://www.deeplearningbook.org. Book in preparation for MIT Press
Google googles. http://www.google.com/mobile/goggles/
Google images. https://images.google.com/
Grana C, Borghesani D, Manfredi M, Cucchiara R (2013) A fast approach for integrating ORB descriptors in the bag of words model. In: Snoek CGM, Kennedy LS, Creutzburg R, Akopian D, Wüller D, Matherson KJ, Georgiev TG, Lumsdaine A (eds) IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics
Gray RM, Neuhoff DL (1998) Quantization. IEEE Trans Inf Theory 44 (6):2325–2383. doi:10.1109/18.720541
Hamming RW (1950) Error detecting and error correcting codes. Bell Syst Tech J 29(2):147–160. doi:10.1002/j.1538-7305.1950.tb00463.x
Heinly J, Dunn E, Frahm JM (2012) Comparative evaluation of binary features Computer vision - ECCV 2012, lecture notes in computer science. Springer, Berlin, pp 759–773
Householder A. (1964) The Theory of Matrices in Numerical Analysis. A Blaisdell book in pure and applied sciences: introduction to higher mathematics. Blaisdell Publishing Company
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval. ACM, New York
Jaakkola T, Haussler D (1998) Exploiting generative models in discriminative classifiers In Advances in Neural Information Processing Systems. http://dl.acm.org/citation.cfm?id=340534.340715, vol 11. MIT Press, pp 487–493
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth D, Torr P, Zisserman A (eds) European Conference on Computer Vision, LNCS, vol I. Springer, pp 304–317
Jégou H, Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. Int J Comput Vis 87(3):316–336. doi:10.1007/s11263-009-0285-2
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation IEEE Conference on Computer Vision & Pattern Recognition. doi:10.1109/CVPR.2010.5540039
Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33 (1):117–128. doi:10.1109/TPAMI.2010.57
Jégou H, Perronnin F, Douze M, Sànchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716. doi:10.1109/TPAMI.2011.235
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding Proceedings of the ACM International Conference on Multimedia. doi:10.1145/2647868.2654889. ACM, pp 675–678
Kaufman L, Rousseeuw P (1987) Clustering by means of medoids. In: Dodge Y (ed) An introduction to l1-norm based statistical data analysis, computational statistics & data analysis, vol 5
Krapac J, Verbeek J, Jurie F (2011) Modeling Spatial Layout with Fisher Vectors for Image Categorization ICCV 2011 - International conference on computer vision. IEEE, Barcelona, pp 1487–1494
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges C, Bottou L, Weinberger K (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc., pp 1097–1105
Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks The IEEE conference on computer vision and pattern recognition (CVPR)
Lazebnik S., Schmid C., Ponce J. (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories 2006 IEEE computer society conference on Computer vision and pattern recognition, vol 2
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444. doi:10.1038/nature14539
Lee S, Choi S, Yang H (2015) Bag-of-binary-features for fast image representation. Electron Lett 51(7):555–557
Leutenegger S, Chli M, Siegwart R (2011) Brisk: Binary robust invariant scalable keypoints IEEE International Conference on Computer vision (ICCV), 2011, pp 2548–2555
Levi G, Hassner T (2015) LATCH: learned arrangements of three patch codes. CoRR abs/1501. 03719
Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval The IEEE conference on computer vision and pattern recognition (CVPR) workshops
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28 (2):129–137. doi:10.1109/TIT.1982.1056489
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94
McLachlan G, Peel D (2000) Finite Mixture Models. Wiley series in probability and statistics. Wiley
Miksik O, Mikolajczyk K (2012) Evaluation of local detectors and descriptors for fast feature matching 2012 21st international conference on Pattern recognition (ICPR), pp 2681–2684
Perd’och M, Chum O, Matas J (2009) Efficient representation of local geometry for large scale object retrieval IEEE Conference on Computer vision and pattern recognition, 2009. CVPR 2009, pp 9–16
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR ’07. doi:10.1109/CVPR.2007.383266, pp 1–8
Perronnin F, Larlus D (2015) Fisher Vectors Meet Neural Networks: A Hybrid Classification Architecture Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3743–3752
Perronnin F, Liu Y, Sànchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/CVPR.2010.5540009, pp 3384–3391
Perronnin F, Sànchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification Computer Vision - ECCV 2010, Lecture Notes in Computer Science. doi:10.1007/978-3-642-15561-1_11, vol 6314. Springer, Berlin, pp 143–156
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/CVPR.2007.383172, pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008. doi:10.1109/CVPR.2008.4587635, pp 1–8
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). doi:10.1109/CVPRW.2014.131. IEEE, pp 512–519
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf 2011 IEEE International Conference on Computer vision (ICCV), pp 2564–2571
Salton G, McGill MJ (1986) Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York
Sànchez J, Redolfi J (2015) Exponential family fisher vector for image classification. Pattern Recogn Lett 59:26–32. doi:10.1016/j.patrec.2015.03.010
Sànchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: Theory and practice. Int J Comput Vis 105 (3):222–245. doi:10.1007/s11263-013-0636-x
Simonyan K, Vedaldi A, Zisserman A (2013) Deep fisher networks for large-scale image classification. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26. Curran Associates, Inc., pp 163–171
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos Proceedings of the Ninth IEEE International Conference on Computer Vision, ICCV ’03. doi:10.1109/ICCV.2003.1238663, vol 2. IEEE Computer Society, pp 1470–1477
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Sydorov V, Sakurada M, Lampert CH (2014) Deep fisher kernels - end to end learning of the fisher kernel gmm parameters The IEEE Conference on Computer vision and pattern recognition (CVPR)
Tolias G, Avrithis Y (2011) Speeded-up, relaxed spatial matching 2011 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/ICCV.2011.6126427, pp 1653–1660
Tolias G, Furon T, Jégou H (2014) Orientation covariant aggregation of local descriptors with embeddings. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014, Lecture Notes in Computer Science, vol 8694. Springer International Publishing, pp 382–397
Tolias G, Jégou H (2013) Local visual query expansion: Exploiting an image collection to refine local descriptors. Research Report RR-8325. https://hal.inria.fr/hal-00840721
Uchida Y, Sakazawa S (2013) Image retrieval with fisher vectors of binary features 2013 2nd IAPR asian conference on Pattern recognition (ACPR), pp 23–28
Ullman S. (1996) High-Level Vision - object recognition and visual cognition. MIT Press
Uricchio T, Bertini M, Seidenari L, Del Bimbo A (2015) Fisher encoded convolutional bag-of-windows for efficient image retrieval and social image tagging The IEEE International Conference on Computer Vision (ICCV) Workshops
van Gemert JC, Geusebroek JM, Veenman CJ, Smeulders AW (2008) Kernel codebooks for scene categorization. In: Forsyth D, Torr P, Zisserman A (eds) Computer Vision - ECCV 2008, Lecture Notes in Computer Science, vol 5304. Springer, Berlin, pp 696–709
Van Opdenbosch D, Schroth G, Huitl R, Hilsenbeck S, Garcea A, Steinbach E (2014) Camera-based indoor positioning using scalable streaming of compressed binary image signatures IEEE International Conference on Image Processing
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification 2010 IEEE conference on Computer vision and pattern recognition (CVPR), pp 3360–3367
Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification IEEE conference on Computer vision and pattern recognition, 2009. CVPR 2009, pp 1794–1801
Yue-Hei Ng J, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval The IEEE conference on computer vision and pattern recognition (CVPR) workshops
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity Search: The Metric Space Approach, Advances in Database Systems vol. 32 Springer
Zhang Y, Zhu C, Bres S, Chen L (2013) Encoding local binary descriptors by bag-of-features with hamming distance for visual object categorization. In: Serdyukov P, Braslavski P, Kuznetsov S, Kamps J, Rüger S, Agichtein E, Segalovich I, Yilmaz E (eds) Advances in Information Retrieval, Lecture Notes in Computer Science, vol 7814. Springer, Berlin, pp 630–641
Zhao W, Jégou H, Gravier G (2013) Oriented pooling for dense and non-dense rotation-invariant features BMVC - 24Th british machine vision conference
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds) Advances in neural information processing systems, vol 27, Curran Associates, Inc., pp 487–495
Acknowledgments
This work was partially founded by: EAGLE, Europeana network of Ancient Greek and Latin Epigraphy, co-founded by the European Commission, CIP-ICT-PSP.2012.2.1 - Europeana and creativity, Grant Agreement n. 325122; and Smart News, Social sensing for breakingnews, co-founded by the Tuscany region under the FAR-FAS 2014 program, CUP CIPE D58C15000270008.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Score vector computation
In the following, we have reported the computation of the score function \(G_{\lambda }^{X}\), defined as the gradient of the log-likelihood of a data X with respect to the parameters λ of a Bernoulli Mixture Model. Throughout this appendix we have used [ [⋅] ] notation to represent the Iverson bracket which equals one if the arguments is true, and zero otherwise.
Under the independence assumption, the Fisher score with respect to the generic parameter λ k is expressed as: \(G_{\lambda _k}^{X} ={\sum }_{t=1}^{\top } \frac {\partial \log p(x_t|\lambda )}{\partial \lambda _k}= {\sum }_{t=1}^{\top } \frac {1} {p(x_t|\lambda )}\frac {\partial }{\partial \lambda _k}\left [{\sum }_{i=1}^{K} w_i p_i(x_t)\right ]\). To compute \(\frac {\partial }{\partial \lambda _k}\left [{\sum }_{i=1}^{K} w_{i} p_{i}(x_t)\right ]\), we first observe that
and
Hence, the Fisher score with respect to the parameter α k is obtained as
and the Fisher score related to the parameter μ k d is
Appendix B: Approximation of the fisher information matrix
Our derivation of the FIM is based on the assumption (see also [55, 63]) that for each observation x=(x 1,…,x D )∈{0,1}D the distribution of the occupancy probability γ(⋅) = p(⋅|x,λ) is sharply peaking, i.e. there is one Bernoulli index k such that γ x (k)≈1 and ∀ i≠k, γ x (i)≈0. This assumption implies that
and then
where [⋅] is the Iverson bracket. The elements of the FIM are defined as:
Hence, the FIM F λ is symmetric and can be written as block matrix
By using the definition of the occupancy probability (i.e. γ x (k) = w k p k (x)/p(x|λ)) and the fact that p k is the distribution of a D-dimensional Bernoulli of mean μ k , we have the following useful equalities:
It follows that F λ may approximated by a diagonal block matrix, because the mixing blocks \(F_{\mu _{kd},\alpha _{i}}\) are close to the zero matrix:
The block F μ,μ can be written as K D×K D diagonal matrix, in fact:
The relation (16) points that the diagonal elements of our FIM approximation are w k /μ k d (1−μ k d ) and the corresponding entries in L λ (i.e. the square root of the inverse of FIM) equal \( \sqrt {{\mu _{kd}(1-\mu _{kd})}/{ w_k}}\). The block related to the α parameters is F α,α =(diag(w)−w w ⊤) where w=[w 1,…,w K ]⊤, in fact:
The matrix F α,α is not invertible (indeed F α,α e=0 where e=[1,…,1]⊤) due to the dependence of the mixing weights \(\left ({\sum }_{i=1}^K\alpha _i={\sum }_{i=1}^K w_i=1\right )\). Since there are only K−1 degrees of freedom in the mixing weight, as proposed in [63], we can fix α K equal to a constant without loss of generality and work with a reduced set of K−1 parameters: \(\tilde {\alpha }=[\alpha _1,\dots ,\alpha _{K-1}]^{\top }\).
Taking into account the Fisher score with respect to \(\tilde {\alpha }\), i.e.
the corresponding block of the FIM is \(F_{\tilde {\alpha },\tilde {\alpha }}= (\text {diag}(\tilde {w})-\tilde {w}\tilde {w}^{\top } ), \) where \(\tilde {w}=[w_1,\dots ,w_{K-1}]^{\top }\). The matrix \(F_{\tilde {\alpha },\tilde {\alpha }}\) is invertible, indeed it can be decomposed into a product of an invertible diagonal matrix \(D=\text {diag}(\tilde {w})\) and an invertible elementary matrix Footnote 8 \(E(\mathbf {e},\tilde {w},-1)= I-\mathbf {e}\tilde {w}^{\top } \); its inverse is
It follows that
where we used \(\mathbf {e}^{\top } G_{\tilde {\alpha }}^{Z}={\sum }_{k=1}^{K-1}{\sum }_{z\in Z} \left (\gamma _{z}(k)-w_k \right ) =-{\sum }_{z\in Z} \left (\gamma _{z}(K)-w_K\right )=-G_{{\alpha _K}}^{Z}\).
By defining \(\mathcal {G}_{\alpha _k}^{X} =\frac {1}{\sqrt {w_k}}{\sum }_{x\in X} \left (\gamma _x(k)-w_{k}\right ), \) we finally obtain \(K_{\tilde {\alpha }}(X,Y)=\left (\mathcal {G}_{\alpha }^{X}\right )^{\top } \mathcal {G}_{\alpha }^{Y}\). Please note that we don’t need to explicitly compute the Cholesky decomposition of the matrix \(F_{\tilde {\alpha },\tilde {\alpha }}^{-1}\) because the Fisher Kernel \(K_{\tilde {\alpha }}(X,Y)\) can be easily rewritten as dot product between the feature vector \(\mathcal {G}_{{\alpha }}^{X}\) and \(\mathcal {G}_{{\alpha }}^{Y}\).
Rights and permissions
About this article
Cite this article
Amato, G., Falchi, F. & Vadicamo, L. Aggregating binary local descriptors for image retrieval. Multimed Tools Appl 77, 5385–5415 (2018). https://doi.org/10.1007/s11042-017-4450-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4450-2