Abstract
In this paper, we propose a new method for features learning from unlabeled data. Basically, we simulate k-means algorithm in deep network architecture to achieve hierarchical Bag-of-Words (BoW) representations. We first learn visual words in each layer which are used to produce BoW feature vectors in the current input space. We transform the raw input data into new feature spaces in a convolutional manner such that more abstract visual words are extracted at each layer by implementing Expectation-Maximization (EM) algorithm. The network parameters are optimized as we keep the visual words fixed in the Expectation step while the visual words are updated with the current parameters of the network in the Maximization step. Besides, we embed spatial information into BoW representation by learning different networks and visual words for each quadrant regions. We compare the proposed algorithm with the similar approaches in the literature using a challenging 10-class-dataset, CIFAR-10.
Chapter PDF
Similar content being viewed by others
Keywords
References
Alpaydın, E.: Introduction to Machine Learning. The MIT Press, London (2004)
Bengio, Y., Courville, A., Vincent, P.: Representation Learning: A Review and New Perspectives. PAMI 35(8), 1798–1828 (2013)
Lowe, D.: Distinctive Image Features From Scale Invariant Keypoints. Int’l J. Computer Vision 60(2), 91–110 (2004)
Bay, H., Ess, A., Tuytelaars, T., Gool, L.C.: SURF: Speeded Up Robust Features. Computer Vision and Image Understanding (CVIU) 110(3), 346–359 (2008)
Bosch, A., Zisserman A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: ACM International Conference on Image and Video Retrieval (2007)
Oliva, A., Torralba, A.: Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope. Int’l J. Computer Vision 42(3), 145–175 (2001)
Krizhevsky, A., Hinton, G.E.: Using very deep auto-encoders for content-based image retrieval. In: ESANN (2011)
Coates, A., Lee, H., Andrew, Y.N.: An analysis of single-layer networks in unsupervised feature learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2011)
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011)
Hinton, G.E., Osindero, S., Teh, Y.W.: A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18(7), 1527–1554 (2006)
Ergul, E., Arica, N.: Scene classification using spatial pyramid of latent topics. In: ICPR, pp. 3603–3606 (2010)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. IEEE CVPR, vol. 2, pp. 2169–2178 (2006)
Arel I., Rose D.C., Karnowski T.P.: Deep Machine Learning: A New Frontier in Artificial Intelligence Research. IEEE Computational Intelligence Magazine 5 (2010)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Geedy Layer-wise Training of Deep Networks. NIPS (2007)
Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009)
Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machine. University of Toronto (2010)
Quoc, L., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., Andrew, N.: Building high-level features using large scale unsupervised learning. In: International Conference in Machine Learning (2012)
Yang, Y., Shah, M.: Complex events detection using data-driven concepts. In: ECCV, pp. 722–735 (2012)
Srivastava, N.: Improving Neural Networks with Dropout. Master of Science Thesis, University of Toronto (2013)
Raina, R., Battle, A., Honglak, L., Packer, B., Andrew Y.N.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th Int’l Conf. on Machine Learning (ICML) (2007)
Krizhevsky, A.: Convolutional Deep Belief Networks on CIFAR-10. Technical Report (2010)
The CIFAR-10 dataset. http://www.cs.toronto.edu/~kriz/cifar.html
Ranzato, M., Krizhevsky, A., Hinton, G.E.: Factored 3-way restricted boltzmann machines for modeling natural images. In: ASTATS 13 (2010)
Ranzato, M., Hinton, G.E.: Modeling pixel means and covariances using factorized third-order boltzmann machines. In: CVPR (2010)
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 392–407. Springer, Heidelberg (2014)
Bergamo, A., Sinha, S.N., Torresani, L.: Leveraging structure from motion to learn discriminative codebooks for scalable landmark classication. In: CVPR (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ergul, E., Erturk, S., Arica, N. (2015). Hierarchical Image Representation Using Deep Network. In: Murino, V., Puppo, E. (eds) Image Analysis and Processing — ICIAP 2015. ICIAP 2015. Lecture Notes in Computer Science(), vol 9280. Springer, Cham. https://doi.org/10.1007/978-3-319-23234-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-23234-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23233-1
Online ISBN: 978-3-319-23234-8
eBook Packages: Computer ScienceComputer Science (R0)