Advertisement

Signal, Image and Video Processing

, Volume 7, Issue 4, pp 759–775 | Cite as

Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification

  • Yousef AlqasrawiEmail author
  • Daniel Neagu
  • Peter I. Cowling
Original Paper

Abstract

The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.

Keywords

Image classification Natural scenes Bag of visual words Integrated visual vocabulary Pyramidal colour moments Feature fusion Semantic modelling 

Abbreviation

M

Number of classes

C

Set of M scene classes

V

Set of M class-specific vocabularies

Vj

Set of k visual words learned from training images of class j

vi

ith visual word

uj

jth visual word

|V|

Size of visual vocabulary

h(d)

Histogram of visual words for image d

hi(d)

Number of descriptors in image d

Nd

Total number of descriptors in image d

L

Number of levels on the spatial pyramid layout

hl(dri)

Histogram vector of BOW for image d at level l and sub-region r i

cl(dri)

Colour moments vector for image d at level l and sub-region r i

m

Number of images in the training image dataset

T

Real-valued threshold vector

Tlri

Average density of keypoints at level land image sub-region r i over m images

H(d)

Feature vector for image d results from concatenation of BOW and weighted pyramidal colour moments

w

Weight vector that indicates the importance of colour information

K

Kernel function

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rui Y., Huang T.S., Chang S.F.: Image retrieval: current techniques, promising directions, and open issues. J. Vis. Commun. Image Represent. 10(1), 39–62 (1999)CrossRefGoogle Scholar
  2. 2.
    Liu Y., Zhang D., Lu G., Ma W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recognit. 40(1), 262–282 (2007)zbMATHCrossRefGoogle Scholar
  3. 3.
    Datta R., Joshi D., Li J., Wang J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)CrossRefGoogle Scholar
  4. 4.
    Vogel, J., Schwaninger, A., Wallraven, C., Bulthoff, H.: Categorization of natural scenes: local versus global information and the role of color. ACM Trans. Appl. Percept. 4(3), November 2007, Article 19 (2007)Google Scholar
  5. 5.
    Ross M.G., Oliva A.: Estimating perception of scene layout properties from global image features. J. Vis. 10(1), 1–25 (2010)CrossRefGoogle Scholar
  6. 6.
    Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling scenes with local descriptors and latent aspects. In: Proceedings of IEEE International Conference on Computer Vision ICCV, Beijing, China, 17–21 Octo 2005, pp. 883–890 (2005)Google Scholar
  7. 7.
    Perina A., Cristani M., Murino V.: Learning natural scene categories by selective multi-scale feature extraction. Image Vis. Comput. 28(6), 927–939 (2010)CrossRefGoogle Scholar
  8. 8.
    Vogel J., Schiele B.: A semantic typicality measure for natural scene categorization. Lect. Notes Comput. Sci. 3175, 195–203 (2004)CrossRefGoogle Scholar
  9. 9.
    Bosch A., Munoz X., Marti R.: Which is the best way to organize/classify images by content?. Image Vis. Comput. 25(6), 778–791 (2007)CrossRefGoogle Scholar
  10. 10.
    Quelhas P., Monay F., Odobez J.M., Gatica-Perez D., Tuytelaars T.: A thousand words in a scene. in: IEEE Trans. Pattern Anal. Mach. Intell. 29(9), 1575–1589 (2007)CrossRefGoogle Scholar
  11. 11.
    Gokalp, D., Aksoy, S.: Scene classification using bag-of-regions representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Minneapolis, Minnesota, USA, 18–23 June 2007, pp. 1–8 (2007)Google Scholar
  12. 12.
    Lowe D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  13. 13.
    Quelhas, P., Odobez, J.: Natural scene image modeling using color and texture visterms. In: Proceedings of International Conference on Image and Video Retrieval, CIVR, Lecture Notes in Computer Science, Tempe, AZ, USA, 13–15 July 2006, pp. 411–421 (2006)Google Scholar
  14. 14.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proceedings of ECCV Workshop on Statistical Learning in Computer Vision, Czech Republic, 11–14 May 2004, pp. 59–74 (2004)Google Scholar
  15. 15.
    Quelhas, P., Odobez, J.: Multi-level local descriptor quantization for bag-of-visterms image representation. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, The Netherlands, 9–11 July 2007, pp. 242–249 (2007)Google Scholar
  16. 16.
    Wu, Z., Ke, Q., Sun, J., Shum, H.Y.: A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 1992–1999 (2009)Google Scholar
  17. 17.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR, New York, USA, 17–22 June 2006, pp. 2161–2168 (2006)Google Scholar
  18. 18.
    Perronnin F.: Universal and adapted vocabularies for generic visual categorization. in: IEEE Trans. Patt. Anal. Mach. Intell. 30(7), 1243–1256 (2008)CrossRefGoogle Scholar
  19. 19.
    Wu, J., Rehg, J.: Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 630–637 (2009)Google Scholar
  20. 20.
    Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, Amsterdam, The Netherlands, 9–11 July 2007, pp. 494–501 (2007)Google Scholar
  21. 21.
    Alqasrawi, Y., Neagu, D., Cowling, P.: Natural scene image recognition by fusing weighted colour moments with bag of visual patches on spatial pyramid layout. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications, ISDA, IEEE Computer Society, Pisa, Italy, 30 Nov—2 Dec 2009, pp. 140–145 (2009)Google Scholar
  22. 22.
    Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the 9th ACM International Workshop on Multimedia Information Retrieval, ACM MIR, University of Augsburg, Germany, 28–29 Sept 2007, pp. 197–206 (2007)Google Scholar
  23. 23.
    Khan, F., van de Weijer, J., Vanrell, M.: Top-down color attention for object recognition. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 979–986 (2009)Google Scholar
  24. 24.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognising natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, New York, USA, 17–22 June 2006, pp. 2169–2178 (2006)Google Scholar
  25. 25.
    Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, Amsterdam, The Netherlands, 9–11 July 2007, pp. 401–408 (2007)Google Scholar
  26. 26.
    Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Anchorage, Alaska, USA, 24–26 June 2008, pp. 1–8 (2008)Google Scholar
  27. 27.
    Battiato, S., Farinella, G., Gallo, G., Ravi, D.: Exploiting textons distributions on spatial hierarchy for scene classification. EURASIP J. Image Video Process. Special Issue Multimed. Model., January 2010, pp. 1–13 (2010)Google Scholar
  28. 28.
    Wang J.Z., Li J., Wiederhold G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. in: IEEE Trans. Patt. Anal. Mach. Intell. 23(9), 947–963 (2001)CrossRefGoogle Scholar
  29. 29.
    Vailaya A., Figueiredo M.A.T., Jain A.K., Zhang H.J.: Image classification for content-based indexing. in: IEEE Trans. Image Process. 10(1), 117–130 (2001)zbMATHCrossRefGoogle Scholar
  30. 30.
    Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of IEEE International Workshop on Content-Based Access of Image and Video Database, Bombay, India, January 1998, pp. 42–51 (1998)Google Scholar
  31. 31.
    Oliva A., Torralba A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)zbMATHCrossRefGoogle Scholar
  32. 32.
    Swain M., Ballard D.: Color indexing. Int. J. Comput. Vis. 7(1), 11–32 (1991)CrossRefGoogle Scholar
  33. 33.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, San Diego, CA, USA, 20–26 June 2005, pp. 524–531 (2005)Google Scholar
  34. 34.
    Bosch, A., Munoz, X., Oliver, A., Marti, R.: Object and scene classification: what does a supervised approach provide us. In: Proceedings of the 18th International Conference on Pattern Recognition, IEEE Computer Society, ICPR, Hong Kong, China, 20–24 Aug 2006, pp. 773–777 (2006)Google Scholar
  35. 35.
    Farinella, G., Battiato, S.: Representation models and machine learning techniques for scene classification. In: Wang, P.S.P. (ed.) Chapter in Pattern Recognition, Machine Vision, Principles and Applications, Chap. 13, pp. 199–214. River publisher, Denmark (2010)Google Scholar
  36. 36.
    Zhu L., Zhang A.: Theory of keyblock-based image retrieval. ACM Trans. Inf. Syst. (TOIS) 20(2), 224–257 (2002)CrossRefGoogle Scholar
  37. 37.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision, ICCV, Nice, France, 14–17 Octo 2003, pp. 1470–1477 (2003)Google Scholar
  38. 38.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: Proceedings of 10th IEEE International Conference on Computer Vision, ICCV, Beijing, China, 17–20 Octo 2005, pp. 604–610 (2005)Google Scholar
  39. 39.
    Nilsback, M., Zisserman, A.: A visual vocabulary for flower classification. In: Proceedings of IEEE Conference on Computer Vision, CVPR, New York, USA, 17–22 June 2006, pp. 1447–1454 (2006)Google Scholar
  40. 40.
    van Gemert J.C., Snoek C.G.M., Veenman C.J., Smeulders A.W.M., Geusebroek J.M.: Comparing compact codebooks for visual categorization. Comput. Vis. Image Underst. 114(4), 450–462 (2010)CrossRefGoogle Scholar
  41. 41.
    Battiato, S., Farinella, G.M., Guarnera, G.C., Meccio, T., Puglisi, G., Ravi, D., Rizzo, R.: Bags of phrases with codebooks alignment for near duplicate image detection. In: Proceedings of the 2nd ACM Workshop on Multimedia in Forensics, Security and Intelligence, Firenze, Italy, 25–29 Octo 2010, pp. 65–70 (2010)Google Scholar
  42. 42.
    Jiang Y.G., Yang J., Ngo C.W., Hauptmann A.G.: Representations of keypoint-based semantic concept detection: a comprehensive study. in: IEEE Trans. Multimed. 12(1), 42–53 (2010)CrossRefGoogle Scholar
  43. 43.
    Mikolajczyk K., Schmid C.: A performance evaluation of local descriptors. in: IEEE Trans Pattern Anal. Machine Intell. 27, 1615–1630 (2005)CrossRefGoogle Scholar
  44. 44.
  45. 45.
    Odone F., Barla A., Verri A.: Building kernels from binary strings for image matching. in: IEEE Trans. Image Process. 14(2), 169–180 (2005)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Chang, C.-C., Ling, C.-J., LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/, 2001
  47. 47.
    van de Sande K.E.A., Gevers T., Snoek C.G.M.: Evaluating color descriptors for object and scene recognition. in: IEEE Trans Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010)CrossRefGoogle Scholar
  48. 48.
    Guldogan E., Gabbouj M.: Feature selection for content-based image retrieval. J. Signal Image Video Process 2(3), 241–250 (2008)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Yousef Alqasrawi
    • 1
    Email author
  • Daniel Neagu
    • 2
  • Peter I. Cowling
    • 3
  1. 1.School of Computing, Informatics and Media (SCIM)University of BradfordBradfordUK
  2. 2.School of Computing, Informatics and Media (SCIM)University of BradfordBradfordUK
  3. 3.School of Computing, Informatics and Media (SCIM)University of BradfordBradfordUK

Personalised recommendations