Multimedia Tools and Applications

, Volume 69, Issue 3, pp 897–920 | Cite as

HWVP: hierarchical wavelet packet descriptors and their applications in scene categorization and semantic concept retrieval

  • Xueming Qian
  • Danping Guo
  • Xingsong Hou
  • Zhi Li
  • Huan Wang
  • Guizhong Liu
  • Zhe Wang


Wavelet packet transform is an effective texture analysis approach by sub-band filtering. Different texture patterns have distinctive responses to the sub-bands of wavelet packets. The responses are valuable for texture description. Utilizing all the responses of the sub-bands of different resolutions can improve texture pattern discrimination power. In this paper, effective texture descriptors based on hierarchical wavelet packet (HWVP) transform are proposed. The subtle sub-bands of wavelet packet transform improve the discrimination power of HWVP descriptors for the images in different categories. Scene categorization performances of the HWVP descriptors under various decomposition levels and wavelet bases are discussed. Performances of HWVP descriptors of global and local images with different partition patterns are also analyzed. The advantages of HWVP descriptors attribute to the following two aspects. Firstly sub-band filtering is helpful for improving the discrimination power of HWVP descriptors to capture the subtle differences of texture patterns. Secondly hierarchical feature representation makes the HWVP descriptors robust to resolution variations. Comparisons are made with some existing robust descriptors on scene categorization and semantic concept retrieval. Experimental results on the widely used OT, Scene-13, Sport Event, and TRECVID 2007 datasets show the effectiveness of the proposed HWVP descriptors.


Scene categorization Wavelet packet TRECVID Concept retrieval SVM 


  1. 1.
    Blei D, Ng A, Jordan M (2003) “Latent dirichlet allocation.” J Mach Learn Res (3): 993–1022Google Scholar
  2. 2.
    Bosch A, Zisserman A, Munoz X (2007) “Representing shape with a spatial pyramid kernel.” In: Proc. CIVRGoogle Scholar
  3. 3.
    Bosch A, Zisserman A, Munoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727CrossRefGoogle Scholar
  4. 4.
    Cai D, He X, Li Z, Ma W, Wen J (2004) “Hierarchical clustering of WWW image search results using visual, textual and link information.” In: Proc. ACM Multimedia, pp. 952–959Google Scholar
  5. 5.
    Campbell M, Haubold A, Liu M, Natsev A, Smith JR, Tesic J, Xie L, Yan R, Yang J (2007) “IBM research TRECVID-2007 video retrieval system.” In: NIST TRECVID WorkshopGoogle Scholar
  6. 6.
    Cao L, Li F (2007) “Spatially coherent latent topic model for concurrent object segmentation and classification.” In: Proc. ICCVGoogle Scholar
  7. 7.
    Chang C, Lin C (2008) “LIBSVM: a library for support vector machines”.
  8. 8.
    Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) “Visual categorization with bags of keypoints.” In: Proc. ECCVGoogle Scholar
  9. 9.
    Fidler S, Boben M, Leonardis A (2008) “Similarity-based cross-layered hierarchical representation for object categorization.” In Proc. CVPRGoogle Scholar
  10. 10.
    Freud Y, Schapire R (1996) “Experiments with a new boosting algorithms.” Machine Learning: Proceedings of the 13th International ConferenceGoogle Scholar
  11. 11.
    Garcia C, Zikos G, Tziritas G (2000) Wavelet packet analysis for face recognition. Image Vision Comput 18:289–297CrossRefGoogle Scholar
  12. 12.
    Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1):177–196zbMATHGoogle Scholar
  13. 13.
    Holub A, Perona P (2005) “A discriminative framework for modeling object classes.” In: Proc. ICCVGoogle Scholar
  14. 14.
    Laine A, Fan J (1993) Texture classification by wavelet packet signatures. IEEE Trans Pattern Anal Mach Intell 15(11):1186–1193CrossRefGoogle Scholar
  15. 15.
    Larlus D, Jurie F (2008) “Combining appearance models and markov random fields for category level object segmentation.” In: Proc. CVPRGoogle Scholar
  16. 16.
    Lazebnik S, Schmid C, Ponce J (2006) “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories.” In: Proc. CVPRGoogle Scholar
  17. 17.
    Li L, Li F (2007) “What, where and who? classifying events by scene and object recognition.” In: Proc. ICCVGoogle Scholar
  18. 18.
    Li F, Perona P (2005) “A Bayesian hierarchy model for learning natural scene categories.” In: Proc. CVPRGoogle Scholar
  19. 19.
    Li J, Wang J (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088CrossRefGoogle Scholar
  20. 20.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. ICCV 60(2):91–110Google Scholar
  21. 21.
    Mutch J, Lowe D (2006) “Multiclass object recognition using sparse, localized features.” In: Proc. CVPRGoogle Scholar
  22. 22.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefzbMATHGoogle Scholar
  23. 23.
    Qian X, Hua X, Chen P, Ke L (2011) PLBP: an effective local binary patterns texture descriptor with pyramid representation. Pattern Recogn 44:2502–2515CrossRefGoogle Scholar
  24. 24.
    Qian X, Liu G, Guo D, Li Z, Wang Z, Wang H (2009) “Object categorization using hierarchical wavelet packet texture descriptors.” In: Proc. ISM, pp. 44–51Google Scholar
  25. 25.
    Qian X, Yan Z, Hang K (2011) “Boosted scene categorization approach by adjusting inner structures and outer weights of weak classifiers”. In: Proc. MMM, pp. 413–423Google Scholar
  26. 26.
    Quattoni A, Collins M, Darrell T (2004) “Conditional random fields for object recognition.” In: NIPSGoogle Scholar
  27. 27.
    Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) “Object in context.” In: Proc. ICCVGoogle Scholar
  28. 28.
    Ro Y, Kim M, Kang H, Manjunath B, Kim J (2001) MPEG-7 homogeneous texture descriptor. ETRI J 23(2):41–51CrossRefGoogle Scholar
  29. 29.
    Serre T, Wolf L, Poggio T (2005) “Object recognition with features inspired by visual cortex.” In: Proc. CVPRGoogle Scholar
  30. 30.
    Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRefGoogle Scholar
  31. 31.
    Sudderth E, Torralba A, Freeman W, Willsky A (2005) “Describing visual scenes using transformed dirichlet processes.” In: NIPSGoogle Scholar
  32. 32.
    Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099CrossRefGoogle Scholar
  33. 33.
    Teh Y, Jordan M, Beal M, Blei D (2006) “Hierarchical Dirichlet processes.” J Am Stat AssocGoogle Scholar
  34. 34.
    Torralba A, William K, Freeman T, Rubin M (2003) “Context-based vision system for place and object recognition.” In: Proc. ICCVGoogle Scholar
  35. 35.
    Wang G, Zhang Y, Li F (2006) “Using dependent regions for object categorization in a generative framework.” In: Proc. CVPRGoogle Scholar
  36. 36.
    Wu L, Hu Y, Li M, Yu N, Hua X (2009) Scale-invariant visual language modeling for object categorization. IEEE Trans Multimedia 11(2):286–294CrossRefGoogle Scholar
  37. 37.
    Yuan J, Wu Y, Yang M (2007) “Discovery of collocation patterns: from visual words to visual phrases.” In: Proc. CVPRGoogle Scholar
  38. 38.
    Zhang H, Berg A, Maire M, Malik J (2006) “Svm-knn: discriminative nearest neighbor classification for visual category recognition.” In: Proc. CVPRGoogle Scholar
  39. 39.
    Zhang J, MarszaÃlek M, Lazebnik S, Schmid C (2007) “Local features and kernels for classification of texture and object categories: a comprehensive study.” Int J Comput VisGoogle Scholar
  40. 40.
    Zheng Y, Zhao M, Neo S, Chua T, Tian Q (2008) “Visual synset: towards a higher-level visual representation.” In: Proc. CVPRGoogle Scholar
  41. 41.
    Zhou X, Wang M, Zhang Q, Zhang J, Shi B (2007) “Automatic image annotation by an iterative approach incorporating keyword correlations and region matching.” In: Proc. CIVR, pp. 25–32Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Xueming Qian
    • 1
  • Danping Guo
    • 1
  • Xingsong Hou
    • 1
  • Zhi Li
    • 1
  • Huan Wang
    • 1
  • Guizhong Liu
    • 1
  • Zhe Wang
    • 1
  1. 1.School of Electronics and Information EngineeringXi’an Jiaotong UniversityXi’anChina

Personalised recommendations