Hierarchical Image Representation Using Deep Network

Ergul, Emrah; Erturk, Sarp; Arica, Nafiz

doi:10.1007/978-3-319-23234-8_7

Emrah Ergul¹⁵,
Sarp Erturk¹⁵ &
Nafiz Arica¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9280))

Included in the following conference series:

International Conference on Image Analysis and Processing

1820 Accesses

Abstract

In this paper, we propose a new method for features learning from unlabeled data. Basically, we simulate k-means algorithm in deep network architecture to achieve hierarchical Bag-of-Words (BoW) representations. We first learn visual words in each layer which are used to produce BoW feature vectors in the current input space. We transform the raw input data into new feature spaces in a convolutional manner such that more abstract visual words are extracted at each layer by implementing Expectation-Maximization (EM) algorithm. The network parameters are optimized as we keep the visual words fixed in the Expectation step while the visual words are updated with the current parameters of the network in the Maximization step. Besides, we embed spatial information into BoW representation by learning different networks and visual words for each quadrant regions. We compare the proposed algorithm with the similar approaches in the literature using a challenging 10-class-dataset, CIFAR-10.

Download to read the full chapter text

Chapter PDF

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Article 10 March 2021

Scale-space multi-view bag of words for scene categorization

Article 07 September 2020

A Visual Inductive Priors Framework for Data-Efficient Image Classification

Keywords

References

Alpaydın, E.: Introduction to Machine Learning. The MIT Press, London (2004)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation Learning: A Review and New Perspectives. PAMI 35(8), 1798–1828 (2013)
Article Google Scholar
Lowe, D.: Distinctive Image Features From Scale Invariant Keypoints. Int’l J. Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Gool, L.C.: SURF: Speeded Up Robust Features. Computer Vision and Image Understanding (CVIU) 110(3), 346–359 (2008)
Article Google Scholar
Bosch, A., Zisserman A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: ACM International Conference on Image and Video Retrieval (2007)
Google Scholar
Oliva, A., Torralba, A.: Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope. Int’l J. Computer Vision 42(3), 145–175 (2001)
Article MATH Google Scholar
Krizhevsky, A., Hinton, G.E.: Using very deep auto-encoders for content-based image retrieval. In: ESANN (2011)
Google Scholar
Coates, A., Lee, H., Andrew, Y.N.: An analysis of single-layer networks in unsupervised feature learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2011)
Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011)
Chapter Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Ergul, E., Arica, N.: Scene classification using spatial pyramid of latent topics. In: ICPR, pp. 3603–3606 (2010)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. IEEE CVPR, vol. 2, pp. 2169–2178 (2006)
Google Scholar
Arel I., Rose D.C., Karnowski T.P.: Deep Machine Learning: A New Frontier in Artificial Intelligence Research. IEEE Computational Intelligence Magazine 5 (2010)
Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Geedy Layer-wise Training of Deep Networks. NIPS (2007)
Google Scholar
Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machine. University of Toronto (2010)
Google Scholar
Quoc, L., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., Andrew, N.: Building high-level features using large scale unsupervised learning. In: International Conference in Machine Learning (2012)
Google Scholar
Yang, Y., Shah, M.: Complex events detection using data-driven concepts. In: ECCV, pp. 722–735 (2012)
Google Scholar
Srivastava, N.: Improving Neural Networks with Dropout. Master of Science Thesis, University of Toronto (2013)
Google Scholar
Raina, R., Battle, A., Honglak, L., Packer, B., Andrew Y.N.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th Int’l Conf. on Machine Learning (ICML) (2007)
Google Scholar
Krizhevsky, A.: Convolutional Deep Belief Networks on CIFAR-10. Technical Report (2010)
Google Scholar
The CIFAR-10 dataset. http://www.cs.toronto.edu/~kriz/cifar.html
Ranzato, M., Krizhevsky, A., Hinton, G.E.: Factored 3-way restricted boltzmann machines for modeling natural images. In: ASTATS 13 (2010)
Google Scholar
Ranzato, M., Hinton, G.E.: Modeling pixel means and covariances using factorized third-order boltzmann machines. In: CVPR (2010)
Google Scholar
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 392–407. Springer, Heidelberg (2014)
Google Scholar
Bergamo, A., Sinha, S.N., Torresani, L.: Leveraging structure from motion to learn discriminative codebooks for scalable landmark classication. In: CVPR (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Electronics & Communication Engineering Department of Kocaeli University, Kocaeli, Turkey
Emrah Ergul & Sarp Erturk
Software Engineering Department of Bahcesehir University, Istanbul, Turkey
Nafiz Arica

Authors

Emrah Ergul
View author publications
You can also search for this author in PubMed Google Scholar
Sarp Erturk
View author publications
You can also search for this author in PubMed Google Scholar
Nafiz Arica
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nafiz Arica .

Editor information

Editors and Affiliations

Pattern Analysis and Computer Vision, Istituto Italiano di Tecnologia (IIT), Genoa, Italy
Vittorio Murino
Università di Genova, Genoa, Italy
Enrico Puppo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ergul, E., Erturk, S., Arica, N. (2015). Hierarchical Image Representation Using Deep Network. In: Murino, V., Puppo, E. (eds) Image Analysis and Processing — ICIAP 2015. ICIAP 2015. Lecture Notes in Computer Science(), vol 9280. Springer, Cham. https://doi.org/10.1007/978-3-319-23234-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-23234-8_7
Published: 21 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23233-1
Online ISBN: 978-3-319-23234-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Hierarchical Image Representation Using Deep Network

Abstract

Chapter PDF

Similar content being viewed by others

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Scale-space multi-view bag of words for scene categorization

A Visual Inductive Priors Framework for Data-Efficient Image Classification

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Hierarchical Image Representation Using Deep Network

Abstract

Chapter PDF

Similar content being viewed by others

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Scale-space multi-view bag of words for scene categorization

A Visual Inductive Priors Framework for Data-Efficient Image Classification

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation