Mining the Discriminative Word Sets for Bag-of-Words Model Based on Distributional Similarity Graph

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9461)

Abstract

Most of the previous distributional clustering methods are fundamentally unsupervised, and the discriminative property of words is not well modeled in the clustering procedure. In this paper, we propose a supervised model which involves the class conditional probability in measuring the word similarity, and transform the word-set extraction to a supervised graph-partition optimization model. A greedy algorithm is proposed to solve this model, which combines the word selecting method and the word grouping method in the unified framework. By grouping the related words, this method essentially transforms the exact match between word bins to fuzzy match between groups of related-word bins, which to some extent avoid the synonymous problems in BoW model. Experiments on data sets demonstrate that the proposed method is applicable for both text sets and image sets, and has advantages in producing better retrieval precision and meanwhile reducing the lexicon size.

References

  1. 1.
    Yogatama, D., Smith, N.: Making the most of bag of words: sentence regularization with alternating direction method of multipliers. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 656–664 (2014)Google Scholar
  2. 2.
    Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 809–816 (2011)Google Scholar
  3. 3.
    Burghouts, G.J., Schutte, K.: Spatio-temporal layout of human actions for improved bag-of-words action detection. Pattern Recogn. Lett. 34(15), 1861–1869 (2013)CrossRefGoogle Scholar
  4. 4.
    Metzler, D.A., Jr.: Beyond bags of words: effectively modeling dependence and features in information retrieval. Dissertation, University of Massachusetts Amherst (2007)Google Scholar
  5. 5.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML 1997), pp. 412–420 (1997)Google Scholar
  6. 6.
    Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Multimedia Information Retrieval, pp. 197–206. ACM (2007)Google Scholar
  7. 7.
    Wang, F., Guibas, L.J.: Supervised earth mover’s distance learning and its computer vision applications. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 442–455. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Budanitsky, A., Hirst, G.: Evaluating worldnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)CrossRefMATHGoogle Scholar
  9. 9.
    Vogel, J., Schiele, B.: Semantic modeling of natural scenes for content-based image retrieval. Int. J. Comput. Vis. 72(2), 133–157 (2007)CrossRefGoogle Scholar
  10. 10.
    Abbasi, A., France, S., Zhang, Z., Chen, H.: Selecting attributes for sentiment classification using feature relation networks. IEEE Trans. Knowl. Data Eng. 23(3), 447–462 (2011)CrossRefGoogle Scholar
  11. 11.
    Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103 (1998)Google Scholar
  12. 12.
    Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 208–215. ACM (2000)Google Scholar
  13. 13.
    Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 183–190 (1993)Google Scholar
  14. 14.
    Zheng, Y.T., Zhao, M., Neo, S.Y., Chua, T.S., Tian, Q.: Visual synset: towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pp. 1–8 (2008)Google Scholar
  15. 15.
    Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: from visual words to visual phrases. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2007) pp. 1–8 (2007)Google Scholar
  16. 16.
    Menéndez-Mora, R.E., Ichise, R.: Effect of semantic differences in wordnet-based similarity measures. In: Garcia-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010, Part II. LNCS, vol. 6097, pp. 545–554. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Mojsilović, A., Gomes, J., Rogowitz, B.: Semantic-friendly indexing and querying of images based on the extraction of the objective semantic cues. Int. J. Comput. Vis. 56(1–2), 79–107 (2004)CrossRefGoogle Scholar
  18. 18.
    Wan, X.: A novel document similarity measure based on earth mover’s distance. Inf. Sci. 177(18), 3718–3730 (2007)CrossRefGoogle Scholar
  19. 19.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)CrossRefMATHGoogle Scholar
  20. 20.
    Van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1271–1283 (2010)CrossRefGoogle Scholar
  21. 21.
    Perronnin, F.: Universal and adapted vocabularies for generic visual categorization. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1243–1256 (2008)CrossRefGoogle Scholar
  22. 22.
    Slonim, N., Friedman, N., Tishby, N.: Agglomerative multivariate information bottleneck. Advances in Neural Information Processing Systems, pp. 929–936 (2001)Google Scholar
  23. 23.
    Xie, X., Lu, L., Jia, M., Li, H., Seide, F., Ma, W.: Mobile search with multimodal queries. Proc. IEEE 96(4), 589–601 (2008)CrossRefGoogle Scholar
  24. 24.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  25. 25.
    Sen, P., Getoor, L.: Link-based classification, University of Maryland Technical report CS-TR-4858 (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of ComputerGuangdong University of TechnologyGuangdongChina

Personalised recommendations