Advertisement

Multilayer Semantic Analysis in Image Databases

  • Ismail El SayadEmail author
  • Jean Martinet
  • Zhongfei (Mark) Zhang
  • Peter Eisert
Chapter
Part of the Annals of Information Systems book series (AOIS, volume 17)

Abstract

With the availability of massive amounts of digital images in personal and on-line collections, effective techniques for navigating, indexing and searching images become more crucial. In this article, we rely on the image visual content as the main source of information to represent images. Starting from the bag of visual words (BOW) representation, a high-level visual representation is learned where each image is modeled as a mixture of visual topics depicted in the image and related to high-level topics. First, we introduce a new probabilistic topic model, Multilayer Semantic Significance Analysis (MSSA) model, in order to study a semantic inference of the constructed visual words. Consequently, we generate the Semantically Significant Visual Words (SSVWs). Second, we strengthen the discrimination power of SSVWs by constructing Semantically Significant Visual Phrases (SSVPs) from frequently co-occurring SSVWs that are semantically coherent. We partially bridge the intra-class visual diversity of the images by re-indexing the SSVWs and the SSVPs based on their distributional clustering. This leads to generating a Semantically Significant Invariant Visual Glossary (SSIVG) representation. Finally, we propose a new Multiclass Vote-Based Classifier (MVBC) based on the proposed SSIVG representation. The large-scale extensive experimental results show that the proposed higher-level visual representation outperforms the traditional part-based image representations in retrieval, classification, and object recognition.

Keywords

Image Retrieval Visual Word Query Image Latent Dirichlet Allocation Sparse Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22, 207–216 (1993)CrossRefGoogle Scholar
  2. 2.
    Baker, L.D., McCallum, A.: Distributional clustering of words for text classification. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998)Google Scholar
  3. 3.
    Bay, H., Tuytelaars, T., Gool, L.J.V.: Surf: Speeded up robust features. Eur. Conf. Comput Vis. (ECCV) 1, 404–417 (2006)Google Scholar
  4. 4.
    Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res. 3, 1183–1208 (2003)Google Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). doi:http://dx.doi.org/10.1162/jmlr.2003.3.4-5.993 Google Scholar
  6. 6.
    Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.T.: Nus-wide: A real-world web image database from national university of singapore. In: ACM International Conference on Image and Video Retrieval (CIVR)Google Scholar
  7. 7.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)CrossRefGoogle Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)Google Scholar
  9. 9.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B. 39(1), 1–38 (1977)Google Scholar
  10. 10.
    Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)Google Scholar
  11. 11.
    El Sayad, I., Martinet, J., Urruty, T., Amir, S., Djeraba, C.: Toward a higher-level visual representation for content-based image retrieval. In: ACM International Conference on Advances in Mobile Computing and Multimedia (ACM MoMM), pp. 213–220 (2010)Google Scholar
  12. 12.
    El Sayad, I., Martinet, J., Urruty, T., Benabbas, Y., Djeraba, C.: A semantically significant visual representation for social image retrieval. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2011). doi:10.1109/ICME.2011.6011867Google Scholar
  13. 13.
    El Sayad, I., Martinet, J., Urruty, T., Dejraba, C.: A semantic higher-level visual representation for object recognition. In: Advances in Multimedia Modeling, Lecture Notes in Computer Science, vol. 6523, pp. 251–261. Springer, Berlin/Heidelberg (2011)Google Scholar
  14. 14.
    El Sayad, I., Martinet, J., Urruty, T., Djeraba, C.: A new spatial weighting scheme for bag-of-visual-words. In: IEEE International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6 (2010)Google Scholar
  15. 15.
    El Sayad, I., Martinet, J., Urruty, T., Djeraba, C.: Toward a higher-level visual representation for content-based image retrieval. Multim. Tools Appl. 1–28 (2010)Google Scholar
  16. 16.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)Google Scholar
  17. 17.
    Gao, S., Tsang, I., Chia, L.T., Zhao, P.: Local features are not lonely—sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3555–3561 (2010). doi: 10.1109/CVPR.2010.5539943Google Scholar
  18. 18.
    Gaussier, E., Goutte, C.: Relation between plsa and nmf and implications. In: The Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 601–602 (2005). doi:http://doi.acm.org/10.1145/1076034.1076148
  19. 19.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1/2), 177–196 (2001)Google Scholar
  20. 20.
    Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: ACM International Conference on Multimedia Information Retrieval (ACM MIR). ACM (2008)Google Scholar
  21. 21.
    Kuhn, H.W.: Nonlinear programming: A historical view. SIGMAP Bull. pp. 6–18 (1982). http://doi.acm.org/10.1145/1111278.1111279
  22. 22.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conf Comput Vis Pattern Recognit (CVPR). 2, 2169–2178 (2006)Google Scholar
  23. 23.
    Lienhart, R., Romberg, S., Hörster, E.: Multilayer plsa for multimodal image retrieval. In: ACM International Conference on Image and Video Retrieval (CIVR), p. 9. ACM (2009)Google Scholar
  24. 24.
    Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145– (1991)CrossRefGoogle Scholar
  25. 25.
    Liu, Y., Zhang, D., Lu, G., Ma, W.: A survey of content-based image retrieval with high-level semantics. Pattern Recognit. 40(1), 262–282 (2007). doi:10.1016/j.patcog.2006.04.045. http://linkinghub.elsevier.com/retrieve/pii/S0031320306002184
  26. 26.
    Ma, H., Zhu, J., Lyu, M.R.T., King, I.: Bridging the semantic gap between image contents and tags. IEEE Trans. Multim. 12(5), 462–473 (2010). doi:10.1109/TMM.2010.2051360Google Scholar
  27. 27.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. IEEE Conf Comput Vis Pattern Recognit (CVPR). 2, 2161–2168 (2006)Google Scholar
  28. 28.
    van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)Google Scholar
  29. 29.
    Rissanen, J.: Stochastic Complexity in Statistical Inquiry Theory. World Scientific Publishing Co., Inc. (1989)Google Scholar
  30. 30.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM. 18(11), 613–620 (1975)Google Scholar
  31. 31.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision (ICCV), pp. 1470–1477 (2003)Google Scholar
  32. 32.
    Sivic, J., Zisserman, A.: Video data mining using configurations of viewpoint invariant regions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 488–495 (2004)Google Scholar
  33. 33.
    Slonim, N., Tishby, N.: The power of word clusters for text classification. In: In 23rd European Colloquium on Information Retrieval Research (2001)Google Scholar
  34. 34.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann (1999)Google Scholar
  35. 35.
    Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 25–32 (2009)Google Scholar
  36. 36.
    Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: ACM Multimedia Information Retrieval. pp. 197–206. ACM, MIR (2007)Google Scholar
  37. 37.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1794–1801 (2009)Google Scholar
  38. 38.
    Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: From visual words to visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2007)Google Scholar
  39. 39.
    Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: From visual words to visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)Google Scholar
  40. 40.
    Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: ACM Multimedia, pp. 75–84. ACM, MM (2009)Google Scholar
  41. 41.
    Zheng, Q.F., Gao, W.: Constructing visual phrases for effective and efficient object-based image retrieval. Trans. Multim. Comput. Commun. Appl. 5(1) (2008)Google Scholar
  42. 42.
    Zheng, Y.T., Zhao, M., Neo, S.Y., Chua, T.S., Tian, Q.: Visual synset: Towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ismail El Sayad
    • 1
    Email author
  • Jean Martinet
    • 2
  • Zhongfei (Mark) Zhang
    • 3
  • Peter Eisert
    • 1
  1. 1.Fraunhofer Heinrich Hertz InstituteBerlinGermany
  2. 2.Villeneuve d’ascqLille 1 UniversityLilleFrance
  3. 3.Computer Science DepartmentSUNY at BinghamtonBinghamtonUSA

Personalised recommendations