Advertisement

Multimedia Tools and Applications

, Volume 39, Issue 2, pp 169–188 | Cite as

Semantic image classification using statistical local spatial relations model

  • Dongfeng Han
  • Wenhui LiEmail author
  • Zongcheng Li
Article

Abstract

In this paper, a statistical model called statistical local spatial relations (SLSR) is presented as a novel technique of a learning model with spatial and statistical information for semantic image classification. The model is inspired by probabilistic Latent Semantic Analysis (PLSA) for text mining. In text analysis, PLSA is used to discover topics in a corpus using the bag-of-word document representation. In SLSR, we treat image categories as topics, therefore an image containing instances of multiple categories can be modeled as a mixture of topics. More significantly, SLSR introduces spatial relation information as a factor which is not present in PLSA. SLSR has rotation, scale, translation and affine invariant properties and can solve partial occlusion problems. Using the Dirichlet process and variational Expectation-Maximization learning algorithm, SLSR is developed as an implementation of an image classification algorithm. SLSR uses an unsupervised process which can capture both spatial relations and statistical information simultaneously. The experiments are demonstrated on some standard data sets and show that the SLSR model is a promising model for semantic image classification problems.

Keywords

Statistical local spatial relations model Semantic image classification Variational expectation maximization Invariant local regions Graph model 

References

  1. 1.
    Agarwal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision-part IV, Copenhagen, Denmark. Springer, London, pp 113–130Google Scholar
  2. 2.
    Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Berlin Heidelberg New YorkGoogle Scholar
  3. 3.
    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHCrossRefGoogle Scholar
  4. 4.
    Bosch A, Munoz X, Marti R (2007) Which is the best way to organize/classify images by content? Image Vis Comput 25(6):778–791CrossRefGoogle Scholar
  5. 5.
    Burl MC, Weber M, Perona P (1998) A probabilistic approach to object recognition using local photometry and global geometry. In: Proceedings of the 5th European conference on computer vision-volume II, Freiburg, Germany. Springer, London, pp 628–641Google Scholar
  6. 6.
    Carneiro G, Vasconcelos N (2005) Formulating semantic image annotation as a supervised learning problem. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 163–168Google Scholar
  7. 7.
    Crandall D, Felzenszwalb P, Huttenlocher D (2005) Spatial priors for part-based recognition using statistical models. In: Proceedings of the 2005 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR 2005), vol 1. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 10–17Google Scholar
  8. 8.
    Dance C, Jutta W, Lixin F, Cedric B, Csurka G (2004) Visual categorization with bags of keypoints. In: Proceedings of the ECCV international workshop on statistical learning in computer vision, Prague, Czech Republic. Springer, Berlin Heidelberg New York, pp 59–74Google Scholar
  9. 9.
    Fan J, Gao Y, Luo H, Xu G (2005) Statistical modeling and conceptualization of natural images. Pattern Recogn 38(6):865–885CrossRefGoogle Scholar
  10. 10.
    Fan J, Luo H, Gao Y (2005) Learning the semantics of images by using unlabeled samples. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 704–710Google Scholar
  11. 11.
    Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79CrossRefGoogle Scholar
  12. 12.
    Fergus R, Li FF, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In: Proceedings of international conference on computer vision (ICCV 2005), vol. 2. Beijing, China. IEEE Computer Society, Washington, DC, pp 1816–1823CrossRefGoogle Scholar
  13. 13.
    Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the 2003 IEEE computer society conference on computer vision and pattern recognition (CVPR 2003), Madison, Wisconsin, USA. IEEE Computer Society, Washington, DC, pp 264–271Google Scholar
  14. 14.
    Greg Griffin AH, Perona P (2007) Caltech-256 object category dataset. Tech. Rep. UCB/CSD-04-1366, California Institute of TechnologyGoogle Scholar
  15. 15.
    Guo GD, Jain AK, Ma WY, Zhang HJ (2002) Learning similarity measure for natural image retrieval with relevance feedback. IEEE Trans Neural Netw 13(4):811–820CrossRefGoogle Scholar
  16. 16.
    Heidemann G (2006) The principal components of natural images revisited. IEEE Trans Pattern Anal Mach Intell 28(5):822–826CrossRefGoogle Scholar
  17. 17.
    Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196zbMATHCrossRefGoogle Scholar
  18. 18.
    Ioffe S, Forsyth DA (2001) Probabilistic methods for finding people. Int J Comput Vis 43(1):45–68zbMATHCrossRefGoogle Scholar
  19. 19.
    Jojic N (2005) A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Trans Pattern Anal Mach Intell 27(9):1392–1416 (Senior Member-Brendan J. Frey)CrossRefGoogle Scholar
  20. 20.
    Li FF, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 2. San Diego, California, USA. IEEE Computer Society, Washington, DC, pp 524–531Google Scholar
  21. 21.
    Lim JH, Jin JS (2005) Combining intra-image and inter-class semantics for consumer image retrieval. Pattern Recogn 38(6):847–864CrossRefGoogle Scholar
  22. 22.
    Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86CrossRefGoogle Scholar
  23. 23.
    Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630CrossRefGoogle Scholar
  24. 24.
    Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vis 38(1):15–33zbMATHCrossRefGoogle Scholar
  25. 25.
    Saha SK, Das AK, Chanda B (2007) Image retrieval based on indexing and relevance feedback. Pattern Recogn Lett 28(3):357–366 (special issue of pattern recognition letters on advances in visual information processing)CrossRefGoogle Scholar
  26. 26.
    Siagian C, Itti L (2007) Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell 29(2):300–312CrossRefGoogle Scholar
  27. 27.
    Sivic J, Russell B, Efros AA, Zisserman A, Freeman B (2005) Discovering objects and their location in images. In: Proceedings of international conference on computer vision (ICCV 2005), vol 1. Beijing, China. IEEE Computer Society, Washington, DC, pp 370–377CrossRefGoogle Scholar
  28. 28.
    Verbeek J (2006) Learning nonlinear image manifolds by global alignment of local linear models. IEEE Trans Pattern Anal Mach Intell 28(8):1236–1250CrossRefGoogle Scholar
  29. 29.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), vol 1. Kauai Marriott, Hawaii, USA. IEEE Computer Society, Washington, DC, pp 511–518Google Scholar
  30. 30.
    Vogel J, Schiele B (2004) A semantic typicality measure for natural scene categorization. In: Proceedings of pattern recognition symposium DAGM’04, Tübingen, September 2004Google Scholar
  31. 31.
    Wainwright MJ, Jordan MI (2003) Graphical models, exponential families, and variational inference. Technical report, Department of Statistics, University of California, BerkeleyGoogle Scholar
  32. 32.
    Wainwright MJ, Jordan MI (2004) Variational inference in graphical models: the view from the marginal polytope. Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, September 2004Google Scholar
  33. 33.
    Weber M (2000) Unsupervised learning of models for object recognition. Ph.D. thesis, CaltechGoogle Scholar
  34. 34.
    Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: Proceedings of international conference on computer vision (ICCV 2005), vol 2. Beijing, China. IEEE Computer Society, Washington, DC, pp 1800–1807CrossRefGoogle Scholar
  35. 35.
    Zhang R, Zhang ZM, Li M, Ma WY, Zhang HJ (2005) A probabilistic semantic model for image annotation and multi-modal image retrieva. In: Proceedings of the tenth IEEE international conference on computer vision (ICCV 2005), vol 1. Beijing, China. IEEE Computer Society, Washington, DC, pp 846–851CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of EducationJilin UniversityChangchunPeople’s Republic of China
  2. 2.School of Engineering TechnologyShandong University of TechnologyZiboPeople’s Republic of China

Personalised recommendations