Restricted Boltzmann machine as an aggregation technique for binary descriptors

Abstract

The article presents a novel approach to the challenge of real-time image classification with deep neural networks. The proposed architecture of the neural network exploits computationally efficient local binary descriptors and uses a restricted Boltzmann machine (RBM) as a feature space projection step so that the resulting depth of the deep neural network can be reduced. A contrastive divergence procedure is used both for RBM training and for feature projection. The resulting neural networks exhibit performance close to the current state-of-the-art but are characterized by a small model memory footprint (i.e., number of parameters) and extremely efficient computational complexity (i.e., response time). The low number of parameters makes these architectures applicable in embedded systems with limited memory or reduced computational capabilities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: European Conference on Computer Vision (ECCV), pp. 778–792 (2010)

  2. 2.

    Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: binary robust invariant scalable keypoints. In: International Conference on Computer Vision (ICCV), pp. 2548–2555 (2011)

  3. 3.

    Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: Computer Vision and Pattern Recognition (CVPR), pp. 510–517 (2012)

  4. 4.

    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)

  5. 5.

    Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157 (1999)

  6. 6.

    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision (ECCV), pp. 404–417 (2006)

  7. 7.

    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893 (2005)

  8. 8.

    Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 806–813 (2014)

  9. 9.

    Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: European conference on computer vision (ECCV), pp. 128–142 (2002)

  10. 10.

    Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision (ICCV), vol. 2, pp. 1470–1477 (2003)

  11. 11.

    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

  12. 12.

    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311 (2010)

  13. 13.

    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

  14. 14.

    Mohedano, E., McGuinness, K., O’Connor, N.E., Salvador, A., Marques, F., Giro-i Nieto, X.: Bags of local convolutional features for scalable instance search. In: International Conference on Multimedia Retrieval (ICMR), pp. 327–331 (2016)

  15. 15.

    Fischer, A., Igel, C.: An introduction to restricted boltzmann machines. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 14–36 (2012)

  16. 16.

    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)

    Article  Google Scholar 

  17. 17.

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  18. 18.

    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). arxiv:1409.4842

  19. 19.

    Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Efficient convolutional neural networks for mobile vision applications (2017). URL https://arxiv.org/pdf/1704.04861.pdf

  20. 20.

    Fischer, A., Igel, C.: An introduction to restricted boltzmann machines. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 14–36. Springer, Berlin (2012)

    Chapter  Google Scholar 

  21. 21.

    Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2130–2137 (2009). https://doi.org/10.1109/ICCV.2009.5459466

  22. 22.

    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5

    Article  Google Scholar 

  23. 23.

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates Inc, New York (2012)

    Google Scholar 

  24. 24.

    Hao Wooi Lim’s blog, friday, august 21, table of results for caltech 101 dataset. http://zybler.blogspot.com/2009/08/table-of-results-for-famous-public.html (2009). Accessed 22 Nov 2018

  25. 25.

    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR arXiv:1406.4729 (2014)

  26. 26.

    Github, cnn-benchmarks. https://github.com/jcjohnson/cnn-benchmarks. Accessed 22 Nov 2018

  27. 27.

    Chatoux, H., Lecellier, F., Fernandez-Maloigne, C.: Comparative study of descriptors with dense key points. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1988–1993 (2016)

  28. 28.

    Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? (2018). URL https://arxiv.org/pdf/1805.08974.pdf

  29. 29.

    Canziani A. Culurciello E, P.A.: An analysis of deep neural network models for practical applications (2016). arxiv:1605.07678

  30. 30.

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25, (2012). https://doi.org/10.1145/3065386

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rafal Kapela.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sobczak, S., Kapela, R., McGuinness, K. et al. Restricted Boltzmann machine as an aggregation technique for binary descriptors. Vis Comput 37, 423–432 (2021). https://doi.org/10.1007/s00371-019-01782-8

Download citation

Keywords

  • Restricted Boltzmann machine
  • Image local binary descriptors
  • Aggregation techniques of feature vectors