Abstract
With the availability of massive amounts of digital images in personal and on-line collections, effective techniques for navigating, indexing and searching images become more crucial. In this article, we rely on the image visual content as the main source of information to represent images. Starting from the bag of visual words (BOW) representation, a high-level visual representation is learned where each image is modeled as a mixture of visual topics depicted in the image and related to high-level topics. First, we introduce a new probabilistic topic model, Multilayer Semantic Significance Analysis (MSSA) model, in order to study a semantic inference of the constructed visual words. Consequently, we generate the Semantically Significant Visual Words (SSVWs). Second, we strengthen the discrimination power of SSVWs by constructing Semantically Significant Visual Phrases (SSVPs) from frequently co-occurring SSVWs that are semantically coherent. We partially bridge the intra-class visual diversity of the images by re-indexing the SSVWs and the SSVPs based on their distributional clustering. This leads to generating a Semantically Significant Invariant Visual Glossary (SSIVG) representation. Finally, we propose a new Multiclass Vote-Based Classifier (MVBC) based on the proposed SSIVG representation. The large-scale extensive experimental results show that the proposed higher-level visual representation outperforms the traditional part-based image representations in retrieval, classification, and object recognition.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22, 207–216 (1993)
Baker, L.D., McCallum, A.: Distributional clustering of words for text classification. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998)
Bay, H., Tuytelaars, T., Gool, L.J.V.: Surf: Speeded up robust features. Eur. Conf. Comput Vis. (ECCV) 1, 404–417 (2006)
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3, 1183–1208 (2003)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). doi:http://dx.doi.org/10.1162/jmlr.2003.3.4-5.993
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.T.: Nus-wide: A real-world web image database from national university of singapore. In: ACM International Conference on Image and Video Retrieval (CIVR)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B. 39(1), 1–38 (1977)
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)
El Sayad, I., Martinet, J., Urruty, T., Amir, S., Djeraba, C.: Toward a higher-level visual representation for content-based image retrieval. In: ACM International Conference on Advances in Mobile Computing and Multimedia (ACM MoMM), pp. 213–220 (2010)
El Sayad, I., Martinet, J., Urruty, T., Benabbas, Y., Djeraba, C.: A semantically significant visual representation for social image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2011). doi:10.1109/ICME.2011.6011867
El Sayad, I., Martinet, J., Urruty, T., Dejraba, C.: A semantic higher-level visual representation for object recognition. In: Advances in Multimedia Modeling, Lecture Notes in Computer Science, vol. 6523, pp. 251–261. Springer, Berlin/Heidelberg (2011)
El Sayad, I., Martinet, J., Urruty, T., Djeraba, C.: A new spatial weighting scheme for bag-of-visual-words. In: IEEE International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6 (2010)
El Sayad, I., Martinet, J., Urruty, T., Djeraba, C.: Toward a higher-level visual representation for content-based image retrieval. Multim. Tools Appl. 1–28 (2010)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)
Gao, S., Tsang, I., Chia, L.T., Zhao, P.: Local features are not lonely—sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3555–3561 (2010). doi: 10.1109/CVPR.2010.5539943
Gaussier, E., Goutte, C.: Relation between plsa and nmf and implications. In: The Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 601–602 (2005). doi:http://doi.acm.org/10.1145/1076034.1076148
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1/2), 177–196 (2001)
Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: ACM International Conference on Multimedia Information Retrieval (ACM MIR). ACM (2008)
Kuhn, H.W.: Nonlinear programming: A historical view. SIGMAP Bull. pp. 6–18 (1982). http://doi.acm.org/10.1145/1111278.1111279
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conf Comput Vis Pattern Recognit (CVPR). 2, 2169–2178 (2006)
Lienhart, R., Romberg, S., Hörster, E.: Multilayer plsa for multimodal image retrieval. In: ACM International Conference on Image and Video Retrieval (CIVR), p. 9. ACM (2009)
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145– (1991)
Liu, Y., Zhang, D., Lu, G., Ma, W.: A survey of content-based image retrieval with high-level semantics. Pattern Recognit. 40(1), 262–282 (2007). doi:10.1016/j.patcog.2006.04.045. http://linkinghub.elsevier.com/retrieve/pii/S0031320306002184
Ma, H., Zhu, J., Lyu, M.R.T., King, I.: Bridging the semantic gap between image contents and tags. IEEE Trans. Multim. 12(5), 462–473 (2010). doi:10.1109/TMM.2010.2051360
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. IEEE Conf Comput Vis Pattern Recognit (CVPR). 2, 2161–2168 (2006)
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Rissanen, J.: Stochastic Complexity in Statistical Inquiry Theory. World Scientific Publishing Co., Inc. (1989)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM. 18(11), 613–620 (1975)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision (ICCV), pp. 1470–1477 (2003)
Sivic, J., Zisserman, A.: Video data mining using configurations of viewpoint invariant regions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 488–495 (2004)
Slonim, N., Tishby, N.: The power of word clusters for text classification. In: In 23rd European Colloquium on Information Retrieval Research (2001)
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann (1999)
Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 25–32 (2009)
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: ACM Multimedia Information Retrieval. pp. 197–206. ACM, MIR (2007)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1794–1801 (2009)
Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: From visual words to visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2007)
Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: From visual words to visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: ACM Multimedia, pp. 75–84. ACM, MM (2009)
Zheng, Q.F., Gao, W.: Constructing visual phrases for effective and efficient object-based image retrieval. Trans. Multim. Comput. Commun. Appl. 5(1) (2008)
Zheng, Y.T., Zhao, M., Neo, S.Y., Chua, T.S., Tian, Q.: Visual synset: Towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
El Sayad, I., Martinet, J., Zhang, Z., Eisert, P. (2015). Multilayer Semantic Analysis in Image Databases. In: Abou-Nasr, M., Lessmann, S., Stahlbock, R., Weiss, G. (eds) Real World Data Mining Applications. Annals of Information Systems, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07812-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-07812-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07811-3
Online ISBN: 978-3-319-07812-0
eBook Packages: Business and EconomicsBusiness and Management (R0)