Elements of Visual Concept Analysis

  • Lei Wu
  • Xian-Sheng Hua
Part of the Studies in Computational Intelligence book series (SCI, volume 346)


Visual concept analysis and measurements consist of low level visual analysis (image representation), image distance measurements (inter-image representation), semantic level concept modeling (concept representation) and concept distance measurements (inter-concept representation), which are four aspects of the fundamental visual concept analysis techniques. In the low level visual analysis, we discuss the visual feature, visual words, and image representations, based on which, we further discuss the image distance measurement. Beyond the low level analysis is the semantic level analysis, where we focus on the concept modeling and concept distance measurements. The methods for semantic level concept modeling can be roughly divided into generative model and discriminative models. In order to facilitate the following discussion on concept distance measurements, we mainly emphasize the generative models, such as bag-of-words model, 2D hidden markov model, visual language model.


Visual Feature Principle Component Analysis Visual Word Query Image Latent Dirichlet Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, A., Triggs, B.: Hyperfeatures – multilevel local coding for visual recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 30–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Ames, M., Naaman, M.: Why we tag: motivations for annotation in mobile and online media. In: CHI 2007 (2007)Google Scholar
  3. 3.
    Opelt, A., Pinz, A., Zisserman, A.: A boundary-fragment-model for object detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 575–588. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Babenko, B., Yang, M.-H., Belongie, S.: Visual Tracking with Online Multiple Instance Learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (2009)Google Scholar
  5. 5.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research 6, 937–965 (2005)MathSciNetGoogle Scholar
  6. 6.
    Baumberg, A.: Reliable feature matching across widely separated views (2000)Google Scholar
  7. 7.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3(5), 993–1022 (2003)CrossRefzbMATHGoogle Scholar
  9. 9.
    Bretzner, L., Lindeberg, T.: Feature tracking with automatic selection of spatial scales. Comput. Vis. Image Underst. 71(3), 385–392 (1998)CrossRefGoogle Scholar
  10. 10.
    Cao, L., Fei-Fei, L.: Spatially coherent latent topic model for concurrent object segmentation and classification. In: Proc. of ICCV 2007 (2007)Google Scholar
  11. 11.
    Cilibrasi, R., Vitanyi, P.M.B.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19, 370 (2007)CrossRefGoogle Scholar
  12. 12.
    Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU–cambridge toolkit, pp. 2707–2710 (1997)Google Scholar
  13. 13.
    Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proc. of ICML 2007, Corvalis, Oregon, pp. 209–216 (2007)Google Scholar
  14. 14.
    Duan, L., Tsang, I.W., Xu, D., Maybank, S.J.: Domain Transfer SVM for Video Concept Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (2009)Google Scholar
  15. 15.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: Proc. of CVPR 2004, vol. 12, p. 178 (2004)Google Scholar
  16. 16.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  17. 17.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Proc. of ICCV 2005, vol. 2, pp. 1816–1823 (2005)Google Scholar
  18. 18.
    Freund, Y., Iyer, R., Schapire, R.E., Singer, Y., Dietterich, G.: An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 170–178 (2003)Google Scholar
  19. 19.
    Garcia-molina, H., Ketchpel, S.P., Shivakumar, N.: Safeguarding and charging for information on the internet. In: Proc. of ICDE 1998 (1998)Google Scholar
  20. 20.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. pp. 518–529 (1999)Google Scholar
  21. 21.
    Han, W., Brady, M.: Real-time corner detection algorithm for motion estimation. Image and Vision Computing, pp. 695–703 (November 1995)Google Scholar
  22. 22.
    Harris, C., Stephens, M.: A combined corner and edge detection. In: Proceedings of The Fourth Alvey Vision Conference, pp. 147–151 (1988)Google Scholar
  23. 23.
    Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proc. of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, Berkeley, California, August 1999, pp. 50–57 (1999)Google Scholar
  24. 24.
    Hoi, S.C.H., Liu, W., Lyu, M.R., Ma, W.-Y.: Learning distance metrics with contextual constraints for image retrieval. In: Proc. of CVPR 2006 (2006)Google Scholar
  25. 25.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM Press, New York (1998)CrossRefGoogle Scholar
  26. 26.
    Goldberger, G.H.J., Roweis, S., Salakhutdinov, R.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems (2005)Google Scholar
  27. 27.
    Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizerGoogle Scholar
  28. 28.
    Ke, Y., Sukthankar, R., Huston, L., Ke, Y., Sukthankar, R.: Efficient near-duplicate detection and sub-image retrieval. In: Proc. of ACM Multimedia 2004, pp. 869–876 (2004)Google Scholar
  29. 29.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)Google Scholar
  30. 30.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (2009)Google Scholar
  31. 31.
    Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. In: Proc. of ICCV 2005, pp. 832–838 (2005)Google Scholar
  32. 32.
    Li, J., Najmi, A., Gray, R.M.: Image classification by a two dimensional hidden markov model. IEEE Trans. Signal Processing 48, 517–533 (1998)Google Scholar
  33. 33.
    Li, J., Wang, J.: Automatic linguistic indexing of pictures by a statistical modeling approachGoogle Scholar
  34. 34.
    Li, S.Z.: Markov Random Field Modeling in Image Analysis. Springer, New York (2001)zbMATHGoogle Scholar
  35. 35.
    Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Norwell (1994)Google Scholar
  36. 36.
    Liu, D., Hua, X.-S., Yang, L., Wang, M., Zhang, H.-J.: Tag ranking. In: Proc. of World Wide Web 2009, WWW 2009 (2009)Google Scholar
  37. 37.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proc. of ICCV 1999, pp. 1150–1157 (1999)Google Scholar
  38. 38.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  39. 39.
    Maji, S., Malik, J.:Google Scholar
  40. 40.
    Marée, R., Geurts, P., Piater, J., Wehenkel, L.: Random subwindows for robust image classification. In: Proc. of CVPR 2005, vol. 1, pp. 34–40 (2005)Google Scholar
  41. 41.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, vol. 1, pp. 384–393 (2002)Google Scholar
  42. 42.
    Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  43. 43.
    Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vision 60(1), 63–86 (2004)CrossRefGoogle Scholar
  44. 44.
    Miller, G.A., et al.: Wordnet, a lexical database for the english language. Cognition Science Lab. Princeton University, Princeton (1995)Google Scholar
  45. 45.
    Moravec, H.P.: Obstacle avoidance and navigation in the real world by a seeing robot rover. PhD thesis, Stanford, CA, USA (1980)Google Scholar
  46. 46.
    Otluman, H., Aboulnasr, T.: Low complexity 2-d hidden markov model for face recognition. In: Proc. of IEEE Conference on International Symposium on Computer Architecture (2000)Google Scholar
  47. 47.
    Paul Schnitzspan, S.R. B.S., Fritz, M.:Google Scholar
  48. 48.
    Qamra, A., Meng, Y.: Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 379–391 (2005); Senior Member-Chang, Edward YCrossRefGoogle Scholar
  49. 49.
    Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Mei, T., Zhang, H.-J.: Correlative multi-label video annotation. In: Proc. of ACM Multimedia 2007 (2007)Google Scholar
  50. 50.
    Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006), pp. 1605–1614 (2006)Google Scholar
  51. 51.
    Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: Proc. of CVPR 2006, pp. 2033–2040 (2006)Google Scholar
  52. 52.
    Sen, S., Lam, S.K., Rashid, A.M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F.M., Riedl, J.: Tagging, communities, vocabulary, evolution. In: CSCW 2006 (2006)Google Scholar
  53. 53.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their localization in images. In: Proc. of ICCV 2005, pp. 370–377 (2005)Google Scholar
  54. 54.
    Smith, S.M., Brady, J.M.: Susan - a new approach to low level image processing. International Journal of Computer Vision 23, 45–78 (1995)CrossRefGoogle Scholar
  55. 55.
    Tomboy, T.H., Bar-hillel, A., Weinshall, D.: Boosting margin based distance functions for clustering. In: Proc. of ICML 2004, pp. 393–400 (2004)Google Scholar
  56. 56.
    Wang, B., Li, Z., Li, M., Ma, W.-Y.: Large-scale duplicate detection for web image search. In: Proc. of ICME 2006 (2006)Google Scholar
  57. 57.
    Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: Semantics-sensitive integrated matching for picture LIbraries. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(9), 947–963 (2001)CrossRefGoogle Scholar
  58. 58.
    Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 1473–1480 (2006)Google Scholar
  59. 59.
    Wu, L., Hu, Y., Li, M., Yu, N., Hua, X.-S.: Scale-invariant visual language modeling for object categorization. IEEE Transactions on Multimedia 11(2), 286–294 (2009)CrossRefGoogle Scholar
  60. 60.
    Wu, L., Hua, X.-S., Yu, N., Ma, W.-Y., Li, S.: Flickr distance. In: MM 2008: Proceeding of the 16th ACM International Conference on Multimedia, pp. 31–40. ACM, New York (2008)CrossRefGoogle Scholar
  61. 61.
    Wu, L., Li, M., Li, Z., Ma, W.-Y., Yu, N.: Visual language modeling for image classification. In: Proc. of 9th ACM SIGMM International Workshop on Multimedia Information Retrieval, MIR 2007 (2007)Google Scholar
  62. 62.
    Wu, L., Liu, J., Li, M., Yu, N.: Query oriented subspace shifting for near-duplicate image detection. In: Proc. of ICME 2008 (2008)Google Scholar
  63. 63.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems (2002)Google Scholar
  64. 64.
    Xu, D., Chang, S.-F.: Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1985–1997 (2008)CrossRefGoogle Scholar
  65. 65.
    Yuan, Y.Y.M., Wu, J.: Discovery of collocation patterns: from visual words to visual phrases. In: Proc. of CVPR 2007 (2007)Google Scholar
  66. 66.
    Zhang, D.-Q., Chang, S.-F.: Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: Proc. of ACM Multimedia 2004, pp. 877–884. ACM, New York (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Lei Wu
    • 1
  • Xian-Sheng Hua
    • 2
  1. 1.MOE-MS Key Lab of MCCUniversity of Science and Technology of ChinaChina
  2. 2.Microsoft Research AsiaChina

Personalised recommendations