Multilayer Semantic Analysis in Image Databases

El Sayad, Ismail; Martinet, Jean; Zhang, Zhongfei (Mark); Eisert, Peter

doi:10.1007/978-3-319-07812-0_19

Multilayer Semantic Analysis in Image Databases

Ismail El Sayad⁷,
Jean Martinet⁸,
Zhongfei (Mark) Zhang⁹ &
…
Peter Eisert⁷

Chapter
First Online: 14 November 2014

2838 Accesses

Part of the book series: Annals of Information Systems ((AOIS,volume 17))

Abstract

With the availability of massive amounts of digital images in personal and on-line collections, effective techniques for navigating, indexing and searching images become more crucial. In this article, we rely on the image visual content as the main source of information to represent images. Starting from the bag of visual words (BOW) representation, a high-level visual representation is learned where each image is modeled as a mixture of visual topics depicted in the image and related to high-level topics. First, we introduce a new probabilistic topic model, Multilayer Semantic Significance Analysis (MSSA) model, in order to study a semantic inference of the constructed visual words. Consequently, we generate the Semantically Significant Visual Words (SSVWs). Second, we strengthen the discrimination power of SSVWs by constructing Semantically Significant Visual Phrases (SSVPs) from frequently co-occurring SSVWs that are semantically coherent. We partially bridge the intra-class visual diversity of the images by re-indexing the SSVWs and the SSVPs based on their distributional clustering. This leads to generating a Semantically Significant Invariant Visual Glossary (SSIVG) representation. Finally, we propose a new Multiclass Vote-Based Classifier (MVBC) based on the proposed SSIVG representation. The large-scale extensive experimental results show that the proposed higher-level visual representation outperforms the traditional part-based image representations in retrieval, classification, and object recognition.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22, 207–216 (1993)
Article Google Scholar
Baker, L.D., McCallum, A.: Distributional clustering of words for text classification. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998)
Google Scholar
Bay, H., Tuytelaars, T., Gool, L.J.V.: Surf: Speeded up robust features. Eur. Conf. Comput Vis. (ECCV) 1, 404–417 (2006)
Google Scholar
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3, 1183–1208 (2003)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). doi:http://dx.doi.org/10.1162/jmlr.2003.3.4-5.993
Google Scholar
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.T.: Nus-wide: A real-world web image database from national university of singapore. In: ACM International Conference on Image and Video Retrieval (CIVR)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Book Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B. 39(1), 1–38 (1977)
Google Scholar
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)
Google Scholar
El Sayad, I., Martinet, J., Urruty, T., Amir, S., Djeraba, C.: Toward a higher-level visual representation for content-based image retrieval. In: ACM International Conference on Advances in Mobile Computing and Multimedia (ACM MoMM), pp. 213–220 (2010)
Google Scholar
El Sayad, I., Martinet, J., Urruty, T., Benabbas, Y., Djeraba, C.: A semantically significant visual representation for social image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2011). doi:10.1109/ICME.2011.6011867
Google Scholar
El Sayad, I., Martinet, J., Urruty, T., Dejraba, C.: A semantic higher-level visual representation for object recognition. In: Advances in Multimedia Modeling, Lecture Notes in Computer Science, vol. 6523, pp. 251–261. Springer, Berlin/Heidelberg (2011)
Google Scholar
El Sayad, I., Martinet, J., Urruty, T., Djeraba, C.: A new spatial weighting scheme for bag-of-visual-words. In: IEEE International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6 (2010)
Google Scholar
El Sayad, I., Martinet, J., Urruty, T., Djeraba, C.: Toward a higher-level visual representation for content-based image retrieval. Multim. Tools Appl. 1–28 (2010)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)
Google Scholar
Gao, S., Tsang, I., Chia, L.T., Zhao, P.: Local features are not lonely—sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3555–3561 (2010). doi: 10.1109/CVPR.2010.5539943
Google Scholar
Gaussier, E., Goutte, C.: Relation between plsa and nmf and implications. In: The Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 601–602 (2005). doi:http://doi.acm.org/10.1145/1076034.1076148
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1/2), 177–196 (2001)
Google Scholar
Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: ACM International Conference on Multimedia Information Retrieval (ACM MIR). ACM (2008)
Google Scholar
Kuhn, H.W.: Nonlinear programming: A historical view. SIGMAP Bull. pp. 6–18 (1982). http://doi.acm.org/10.1145/1111278.1111279
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conf Comput Vis Pattern Recognit (CVPR). 2, 2169–2178 (2006)
Google Scholar
Lienhart, R., Romberg, S., Hörster, E.: Multilayer plsa for multimodal image retrieval. In: ACM International Conference on Image and Video Retrieval (CIVR), p. 9. ACM (2009)
Google Scholar
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145– (1991)
Article Google Scholar
Liu, Y., Zhang, D., Lu, G., Ma, W.: A survey of content-based image retrieval with high-level semantics. Pattern Recognit. 40(1), 262–282 (2007). doi:10.1016/j.patcog.2006.04.045. http://linkinghub.elsevier.com/retrieve/pii/S0031320306002184
Ma, H., Zhu, J., Lyu, M.R.T., King, I.: Bridging the semantic gap between image contents and tags. IEEE Trans. Multim. 12(5), 462–473 (2010). doi:10.1109/TMM.2010.2051360
Google Scholar
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. IEEE Conf Comput Vis Pattern Recognit (CVPR). 2, 2161–2168 (2006)
Google Scholar
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Google Scholar
Rissanen, J.: Stochastic Complexity in Statistical Inquiry Theory. World Scientific Publishing Co., Inc. (1989)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM. 18(11), 613–620 (1975)
Google Scholar
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision (ICCV), pp. 1470–1477 (2003)
Google Scholar
Sivic, J., Zisserman, A.: Video data mining using configurations of viewpoint invariant regions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 488–495 (2004)
Google Scholar
Slonim, N., Tishby, N.: The power of word clusters for text classification. In: In 23rd European Colloquium on Information Retrieval Research (2001)
Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann (1999)
Google Scholar
Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 25–32 (2009)
Google Scholar
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: ACM Multimedia Information Retrieval. pp. 197–206. ACM, MIR (2007)
Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1794–1801 (2009)
Google Scholar
Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: From visual words to visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2007)
Google Scholar
Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: From visual words to visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: ACM Multimedia, pp. 75–84. ACM, MM (2009)
Google Scholar
Zheng, Q.F., Gao, W.: Constructing visual phrases for effective and efficient object-based image retrieval. Trans. Multim. Comput. Commun. Appl. 5(1) (2008)
Google Scholar
Zheng, Y.T., Zhao, M., Neo, S.Y., Chua, T.S., Tian, Q.: Visual synset: Towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer Heinrich Hertz Institute, Berlin, Germany
Ismail El Sayad & Peter Eisert
Villeneuve d’ascq, Lille 1 University, Lille, France
Jean Martinet
Computer Science Department, SUNY at Binghamton, Binghamton, NY, 13905, USA
Zhongfei (Mark) Zhang

Authors

Ismail El Sayad
View author publications
You can also search for this author in PubMed Google Scholar
Jean Martinet
View author publications
You can also search for this author in PubMed Google Scholar
Zhongfei (Mark) Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Peter Eisert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ismail El Sayad .

Editor information

Editors and Affiliations

Research & Advanced Engineering, Ford Motor Company, Dearborn, Michigan, USA
Mahmoud Abou-Nasr
Universität Hamburg Inst. Wirtschaftsinformatik, Hamburg, Germany
Stefan Lessmann
Universität Hamburg Inst. Wirtschaftsinformatik, Hamburg, Germany
Robert Stahlbock
Deptartment of Computer & Information Science, Fordham University, Bronx, New York, USA
Gary M. Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

El Sayad, I., Martinet, J., Zhang, Z., Eisert, P. (2015). Multilayer Semantic Analysis in Image Databases. In: Abou-Nasr, M., Lessmann, S., Stahlbock, R., Weiss, G. (eds) Real World Data Mining Applications. Annals of Information Systems, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07812-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-07812-0_19
Published: 14 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07811-3
Online ISBN: 978-3-319-07812-0
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics