Building effective SVM concept detectors from clickthrough data for large-scale image retrieval

  • Ioannis SarafisEmail author
  • Christos Diou
  • Anastasios Delopoulos
Regular Paper


Clickthrough data is a source of information that can be used for automatically building concept detectors for image retrieval. Previous studies, however, have shown that in many cases the resulting training sets suffer from severe label noise that has a significant impact in the SVM concept detector performance. This paper evaluates and proposes a set of strategies for automatically building effective concept detectors from clickthrough data. These strategies focus on: (1) automatic training set generation; (2) assignment of label confidence weights to the training samples and (3) using these weights at the classifier level to improve concept detector effectiveness. For training set selection and in order to assign weights to individual training samples three Information Retrieval (IR) models are examined: vector space models, BM25 and language models. Three SVM variants that take into account importance at the classifier level are evaluated and compared to the standard SVM: the Fuzzy SVM, the Power SVM, and the Bilateral-weighted Fuzzy SVM. Experiments conducted on the MM Grand Challenge dataset (consisting of 1M images and 82.3M unique clicks) for 40 concepts demonstrate that (1) on average, all weighted SVM variants are more effective than the standard SVM; (2) the vector space model produces the best training sets and best weights; (3) the Bilateral-weighted Fuzzy SVM produces the best results but is very sensitive to weight assignment and (4) the Fuzzy SVM is the most robust training approach for varying levels of label noise.


Clickthrough data Concept-based image retrieval SVM Fuzzy SVM Power SVM  Bilateral-weighted Fuzzy SVM 


  1. 1.
    Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167. doi: 10.1023/A:1009715923555 CrossRefGoogle Scholar
  2. 2.
    Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27. doi: 10.1145/1961189.1961199
  3. 3.
    Chapelle O, Zhang Y (2009) A dynamic bayesian network click model for web search ranking. In: Proceedings of the 18th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’09, pp 1–10. doi: 10.1145/1526709.1526711
  4. 4.
    Craswell N, Szummer M (2007) Random walks on the click graph. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’07, pp 239–246. doi: 10.1145/1277741.1277784
  5. 5.
    Dupret G, Liao C (2010) A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, WSDM ’10, pp 181–190. doi: 10.1145/1718487.1718510
  6. 6.
    Fang Q, Xu H, Wang R, Qian S, Wang T, Sang J, Xu C (2013) Towards MSR-Bing challenge: ensemble of diverse models for image retrieval. Accessed 15 Aug 2014
  7. 7.
    Hiemstra D (1998) A linguistically motivated probabilistic model of information retrieval. In: Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, vol 1513, pp 569–584. Springer, Berlin Heidelberg. doi: 10.1007/3-540-49653-X_34
  8. 8.
    Hsu CC, Han MF, Chang SH, Chung HY (2009) Fuzzy support vector machines with the uncertainty of parameter C. Expert Systems Appl 36(3, Part 2):6654–6658. doi: 10.1016/j.eswa.2008.08.032
  9. 9.
    Hua XS, Yang L, Wang J, Wang J, Ye M, Wang K, Rui Y, Li J (2013) Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 243–252. doi: 10.1145/2502081.2502283
  10. 10.
    Inoue T, Abe S (2001) Fuzzy support vector machines for pattern classification. In: Proceedings of International Joint Conference on Neural Networks, 2001. IJCNN ’01., vol 2, pp 1449–1454. doi: 10.1109/IJCNN.2001.939575
  11. 11.
    Jain V, Varma M (2011) Learning to re-rank: query-dependent image re-ranking using click data. In: Proceedings of the 20th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’11, pp 277–286. doi: 10.1145/1963405.1963447.
  12. 12.
    Jilani T, Burney S (2008) Multiclass bilateral-weighted fuzzy support vector machine to evaluate financial strength credit rating. In: Proceedings of International Conference on Computer Science and Information Technology, 2008. ICCSIT ’08, pp 342–348. doi: 10.1109/ICCSIT.2008.191
  13. 13.
    Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471. doi: 10.1109/72.991432 CrossRefGoogle Scholar
  14. 14.
    Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University PressGoogle Scholar
  15. 15.
    Min R, Cheng HD (2009) Effective image retrieval using dominant color descriptor and fuzzy support vector machine. Pattern Recogn 42(1):147–157. doi: 10.1016/j.patcog.2008.07.001 CrossRefzbMATHGoogle Scholar
  16. 16.
    Pan Y, Yao T, Yang K, Li H, Ngo CW, Wang J, Mei T (2013) Image search by graph-based label propagation with image representation from DNN. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 397–400. doi: 10.1145/2502081.2508128
  17. 17.
    Pan Y, Yao T, Mei T, Li H, Ngo CW, Rui Y (2014) Click-through-based cross-view learning for image search. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’14, pp 717–726. doi: 10.1145/2600428.2609568.
  18. 18.
    Radlinski F, Joachims T (2005) Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, ACM, New York, NY, USA, KDD ’05, pp 239–248. doi: 10.1145/1081870.1081899
  19. 19.
    Rao Y, Mundur P, Yesha Y (2006) Fuzzy SVM ensembles for relevance feedback in image retrieval. In: Proceedings of the 5th International Conference on Image and Video Retrieval, Springer, Berlin, Heidelberg, CIVR’06, pp 350–359. doi: 10.1007/11788034_36
  20. 20.
    Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Now Publishers IncGoogle Scholar
  21. 21.
    Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conf erence on Research and Development in Information Retreival, Springer, New York Inc, New York, NY, USA, SIGIR ’94, pp 232–241Google Scholar
  22. 22.
    Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi: 10.1145/361219.361220
  23. 23.
    Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596. doi: 10.1109/TPAMI.2009.154 CrossRefGoogle Scholar
  24. 24.
    Sarafis I, Diou C, Delopoulos A (2014a) Building robust concept detectors from clickthrough data: a study in the msr-bing dataset. In: Semantic and Social Media Adaptation and Personalization (SMAP), 2014 9th International Workshop, pp 66–71. doi: 10.1109/SMAP.2014.22
  25. 25.
    Sarafis I, Diou C, Tsikrika T, Delopoulos A (2014) Weighted SVM from clickthrough data for image retrieval. In: IEEE International Conference on Image Process 2014 (ICIP 2014). France, Paris, pp 3051–3055Google Scholar
  26. 26.
    Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322. doi: 10.1561/1500000014 CrossRefGoogle Scholar
  27. 27.
    Sohail A, Bhattacharya P, Mudur S, Krishnamurthy S (2011) Classification of ultrasound medical images using distance based feature selection and fuzzy-SVM. In: Pattern Recognit and Image Anal, Lecture Notes in Computer Science, vol 6669, pp 176–183. Springer, Berlin Heidelberg. doi: 10.1007/978-3-642-21257-4_22
  28. 28.
    Sun Z, Ruan D, Ma Y, Hu X, Zhang Xg (2009) Crack defects detection in radiographic weldment images using FSVM and beamlet transform. In: Proceedings of the 6th International Conference on Fuzzy Systems and Knowl Discoverey, vol 3, IEEE Press, Piscataway, NJ, USA, FSKD’09, pp 402–406Google Scholar
  29. 29.
    Tsikrika T, Diou C (2014) Multi-evidence user group discovery in professional image search. In: de Rijke M, Kenter T, de Vries A, Zhai C, de Jong F, Radinsky K, Hofmann K (eds) Advances in Information Retrieval, Lecture Notes in Computer Science, vol 8416, pp 693–699. Springer, Berlin. doi: 10.1007/978-3-319-06028-6_78.
  30. 30.
    Tsikrika T, Diou C, de Vries AP, Delopoulos A (2009) Are clickthrough data reliable as image annotations? In: Proceedings of the Theseus/ImageCLEF workshop on visual information retrieval. Fraunhofer Verlag, CorfuGoogle Scholar
  31. 31.
    Tsikrika T, Diou C, de Vries A, Delopoulos A (2011) Reliability and effectiveness of clickthrough data for automatic image annotation. Multimed Tools Appl 55(1):27–52. doi: 10.1007/s11042-010-0584-1 CrossRefGoogle Scholar
  32. 32.
    Wang L, Cen S, Bai H, Huang C, Zhao N, Liu B, Feng Y, Dong Y (2013) France telecom orange labs (beijing) at MSR-Bing challenge on image retrieval 2013. Accessed 15 Aug 2014
  33. 33.
    Wang Y, Wang S, Lai KK (2005) A new fuzzy support vector machine to evaluate credit risk. Trans Fuzzy Syst 13(6):820–831. doi: 10.1109/TFUZZ.2005.859320 CrossRefGoogle Scholar
  34. 34.
    Wu CC, Chu KY, Kuo YH, Chen YY, Lee WY, Hsu WH (2013) Search-based relevance association with auxiliary contextual cues. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 393–396. doi: 10.1145/2502081.2508127
  35. 35.
    Wu K, Yap KH (2006) Fuzzy SVM for content-based image retrieval: a pseudo-label support vector machine framework. Comp Intell Mag 1(2):10–16. doi: 10.1109/MCI.2006.1626490 CrossRefGoogle Scholar
  36. 36.
    Gm Xian (2010) An identification method of malignant and benign liver tumors from ultrasonography based on GLCM texture features and fuzzy SVM. Expert Syst Appl 37(10):6737–6741. doi: 10.1016/j.eswa.2010.02.067 CrossRefGoogle Scholar
  37. 37.
    Yang X, Zhang Y, Yao T, Ngo CW, Mei T (2014) Click-boosting multi-modality graph-based reranking for image search. Multimed Syst 1–11. doi: 10.1007/s00530-014-0379-8
  38. 38.
    Yu SX (2012) Power SVM: generalization with exemplar classification uncertainty. In: Proceedings of the 2012 IEEE Conference on Comput Visual and Pattern Recognition (CVPR), IEEE Computer Society, Washington, DC, USA, CVPR ’12, pp 2144–2151Google Scholar
  39. 39.
    Zhang Y, Yang X, Mei T (2014) Image search reranking with query-dependent click-based relevance feedback. Image Process IEEE Trans 23(10):4448–4459. doi: 10.1109/TIP.2014.2346991 CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Ioannis Sarafis
    • 1
    Email author
  • Christos Diou
    • 1
  • Anastasios Delopoulos
    • 1
  1. 1.Multimedia Understanding Group, Electrical and Computer Engineering DepartmentAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations