Advertisement

Multimedia Tools and Applications

, Volume 77, Issue 17, pp 22145–22158 | Cite as

Click data guided query modeling with click propagation and sparse coding

  • Min Tan
  • Jun Yu
  • Qingming Huang
  • Weichen Wu
Article

Abstract

We address the problem of fine-grained image recognition using user click data, wherein each image is represented as a semantical query-click feature vector. Usually, the query set obtained from search engines is large-scale and redundant, making the click feature be high-dimensional and sparse. We propose a novel query modeling approach to merge semantically similar queries, and construct a compact click feature with the merged queries. To deal with the sparsity and in-consistency in click feature, we design a graph based propagation approach to predict the zero-clicks, ensuring similar images have similar clicks for each query. Afterwards, using the propagated click feature, we formulate the query merging problem as a sparse coding based recognition task. In addition, the hot queries are utilized to construct the dictionary. We evaluate our method for fine-grained image recognition on the public Clickture-Dog dataset. It is shown that, the propagated click feature performs much better than the original one. In the query merging procedure, sparse coding performs better than traditional K-mean algorithm. Also, the “hot queries” outperform K-SVD in dictionary learning.

Keywords

Image recognition Click data Sparse coding Query modeling Graph based model 

Notes

Acknowledgments

This work was partly supported by National Natural Science Foundation of China (No. 61602136, No.61622205, No. 61472110), and Zhejiang Provincial Natural Science Foundation of China under Grant LR15F020002.

References

  1. 1.
    Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: large-scale fine-grained visual categorization of birds. In: IEEE Conference on computer vision and pattern recognition, pp 2019–2026Google Scholar
  2. 2.
    Chang YS (2017) Fine-grained attention for image caption generation. Multimed Tool Appl PP(7):1–13Google Scholar
  3. 3.
    Cilibrasi RL, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383CrossRefGoogle Scholar
  4. 4.
    Datta D, Singh SK, Chowdary CR (2017) Bridging the gap: effect of text query reformulation in multimodal retrieval. Multimed Tool Appl 76:1–18CrossRefGoogle Scholar
  5. 5.
    Feng L, Bhanu B (2016) Semantic concept co-occurrence patterns for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 38(4):1–1CrossRefGoogle Scholar
  6. 6.
    Hua XS, Yang L, Wang J, Wang J, Ye M, Wang K, Rui Y, Li J (2013) Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: ACM International conference on multimedia. ACM, pp 243–252Google Scholar
  7. 7.
    Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization, IEEE conference on computer vision and pattern recognition. Colorado Springs, COGoogle Scholar
  8. 8.
    Li C, Song Q, Wang Y, Song H, Kang Q, Cheng J, Lu H (2016) Learning to recognition from bing clickture data. In: IEEE International conference on multimedia and expo workshops, pp 1–4Google Scholar
  9. 9.
    Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461CrossRefGoogle Scholar
  10. 10.
    Nie L, Wang M, Zha Z, Li G, Chua TS (2011) Multimedia answering: enriching text qa with media information. In: ACM SIGIR Conference on research and development in information retrieval, SIGIR ‘11. ACM, pp 695–704Google Scholar
  11. 11.
    Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):13:1–13:23CrossRefGoogle Scholar
  12. 12.
    Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: ACM International conference on multimedia, MM’12. ACM, pp 59–68Google Scholar
  13. 13.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  14. 14.
    Tan M, Wang Y, Pan G (2012) Feature reduction for efficient object detection via L1-norm latent SVM. In: Intelligent science and intelligent data engineeringGoogle Scholar
  15. 15.
    Tan M, Pan G, Wang Y, Zhang Y, Wu Z (2014) L1-norm latent svm for compact features in object detection. Neurocomputing 139(139):56–64CrossRefGoogle Scholar
  16. 16.
    Tan M, Hu Z, Wang B, Zhao J, Wang Y (2016) Robust object recognition via weakly supervised metric and template learning. Neurocomputing 101:96–107CrossRefGoogle Scholar
  17. 17.
    Tan M, Wang B, Wu Z, Wang J, Pan G (2016) Weakly supervised metric learning for traffic sign recognition in a lidar-equipped vehicle. IEEE Trans Intell Transp Syst 17(5):1415–1427.  https://doi.org/10.1109/TITS.2015.2506182 CrossRefGoogle Scholar
  18. 18.
    Tan M, Yu J, Zheng G, Wu W, Sun K (2016) Deep neural network boosted large scale image recognition using user click data. In: International conference on internet multimedia computing and service, pp 118–121Google Scholar
  19. 19.
    Tsung-Yu Lin AR, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: IEEE International conference on computer visionGoogle Scholar
  20. 20.
    Wang R, Liu T, Tao D (2017) Multiclass learning with partially corrupted labels. IEEE Trans Neural Netw Learn Syst PP(99):1–13Google Scholar
  21. 21.
    Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18(12):2494–2502Google Scholar
  22. 22.
    Yan C, Luo M, Liu W, Zheng Q (2017) Robust dictionary learning with graph regularization for unsupervised person re-identification. Multimed Tool Appl (2):1–25Google Scholar
  23. 23.
    Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742CrossRefGoogle Scholar
  24. 24.
    Yu J, Wang M, Tao D (2012) Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Trans Image Process 21(11):4636–4648MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Yu J, Rui Y, Chen B (2014) Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans Multimedia 16(1):159–168CrossRefGoogle Scholar
  26. 26.
    Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Yu J, Tao D, Meng W, Yong R (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779CrossRefGoogle Scholar
  28. 28.
    Zhang H, Zha ZJ, Yang Y, Yan S, Chua TS (2014) Robust (semi) nonnegative graph embedding. IEEE Trans Image Process A Publ the IEEE Signal Process Society 23(7):2996–3012Google Scholar
  29. 29.
    Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2014) Attribute-augmented semantic hierarchy:towards a unified framework for content-based image retrieval. ACM Trans Multimed Comput Commun Appl 11(1s):1–21CrossRefGoogle Scholar
  30. 30.
    Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: venue category estimation from micro-video. In: ACM On multimedia conference, pp 1415–1424Google Scholar
  31. 31.
    Zhang Y, Wei XS, Wu J, Cai J (2016) Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process 25(4):1713–1725MathSciNetCrossRefGoogle Scholar
  32. 32.
    Zhang H, Huang Y, Xu X, Zhu Z, Deng C (2017) Latent semantic factorization for multimedia representation learning. Multimed Tool Appl (1):1–16Google Scholar
  33. 33.
    Zheng G, Tan M, Yu J, Wu Q, Fan J (2017) Fine-grained image recongnition via weakly supervised click data guided bilinear cnn model. In: IEEE International conference on multimedia and expo (accpet). IEEEGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and TechnologyHangzhou Dianzi UniversityHangzhouChina
  2. 2.School of Computer and Control EngineeringUniversity of Chinese Academy of SciencesBeijingChina

Personalised recommendations