Skip to main content
Log in

Unsupervised query-adaptive implicit subtopic discovery for diverse image retrieval based on intrinsic cluster quality

  • 1135T: Social Multimedia Processing
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Given the complex search tasks imposed to social multimedia retrieval systems, the generated similarity-based ranked results often represent redundant item sets, including, e.g., near-duplicates or unrepresentative samples. In this context, several real-world search tasks demand broad coverage of multiple implicit subtopics of a given query in order to properly fulfill the user need. Many works have proposed the use of result diversification for addressing such problem. As a popular approach, the diversification is achieved by grouping similar items obtained from the original ranked list. Hence, a new and diverse ranked list is constructed by iteratively selecting a representative item from each cluster. However, the definition of the number of clusters (subtopics) to be discovered is a long-lasting challenge. Moreover, most clustering optimization approaches for diversification rely on offline training for the selection of a general best configuration used for all queries at run-time. However, this is a complex task given the multiple heterogeneity associated (data, user, query, concepts, etc.) and the consequent impact on the effectiveness of retrieval algorithms. Therefore, such approaches are usually prone to overfit. Hence, in order to attenuate such problems, this work proposes a novel diverse image retrieval approach as an unsupervised query-adaptive subtopic discovery based on intrinsic clustering quality optimization. Our experimental analysis have shown significant improvements in relation to the baseline, both in terms of relevance and diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.flickr.com (As of July 2021)

References

  1. Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Addison-Wesley Publishing Company, USA

    Google Scholar 

  2. Bholowalia P, Kumar A (2014) Ebk-means: A clustering technique based on elbow method and k-means in wsn. Int J Comput Appl 105(9)

  3. Biasotti S, Cerri A, Giorgi D, Spagnuolo M (2013) PHOG: photometric and geometric functions for textured shape retrieval. Comput Graph Forum 32(5):13–22. https://doi.org/10.1111/cgf.12168

    Article  Google Scholar 

  4. Calumby RT, Gonçalves MA, da Silva Torres R (2017) Diversity-based interactive learning meets multimodality. Neurocomputing 259:159–175. https://doi.org/10.1016/j.neucom.2016.08.129

    Article  Google Scholar 

  5. Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM Conference on Research and Development in Information Retrieval. ACM, New York, NY, USA. https://doi.org/10.1145/290941.291025, pp 335–336

  6. Chang W, Yeh Y, Wang YF (2016) Style-oriented landmark retrieval and summarization. In: Asia-pacific signal and information processing association annual summit and conference, APSIPA 2016, jeju, south korea, december 13-16, 2016. IEEE. https://doi.org/10.1109/APSIPA.2016.7820857, pp 1–4

  7. Chatzichristofis SA, Boutalis YS (2008) CEDD: color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In: Computer Vision systems, 6th international conference, ICVS 2008, Santorini, Greece, May 12-15, 2008, Proceedings. https://doi.org/10.1007/978-3-540-79547-6_30, pp 312–322

  8. Chatzichristofis SA, Boutalis YS (2008) FCTH: fuzzy color and texture histogram - A low level feature for accurate image retrieval. In: Ninth International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2008, Klagenfurt, Austria, May 7-9, 2008. https://doi.org/10.1109/WIAMIS.2008.24, pp 191–196

  9. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227. https://doi.org/10.1109/TPAMI.1979.4766909

    Article  Google Scholar 

  10. Do Carmo Araujo IBA, Calumby RT (2016) Features fusion for diversity gap reduction. In: 31º Simpȯsio brasileiro de banco de dados, 2016, salvador, bahia, brasil, october 4-7, 2016, pp 175–180

  11. Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046

    Article  MathSciNet  MATH  Google Scholar 

  12. Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, USA. https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf. Accessed 13 Aug 2021, pp 226–231

  13. Ferreira CD, Calumby RT, do Carmo Araujo IBA, Dourado ÍC, Muñoz JAV, Penatti OAB, Li LT, Almeida J, da Silva Torres R (2016) Recod @ mediaeval 2016: Diverse social images retrieval. In: Working notes proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_21.pdf. Accessed 13 Aug 2021

  14. Ferreira CD, Calumby RT, do Carmo Araujo IBA, Dourado ÍC, Muñoz JAV, Penatti OAB, Li LT, Almeida J, da Silva Torres R (2016) Recod @ mediaeval 2016: Diverse social images retrieval. In: Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_21.pdf. Accessed 13 Aug 2021

  15. González ÁC, Garcia XB, García-Serrano A, de Ves Cuenca E (2016) UNED-UV@retrieving diverse social images task. In: Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf. Accessed 13 Aug 2021

  16. Han J, Kamber M, Pei J (2012) 10 - cluster analysis: Basic concepts and methods. In: Data mining: concepts and techniques, the morgan kaufmann series in data management systems, third edn. Morgan Kaufmann, Boston, pp 443–495

  17. He J, Meij E, de Rijke M (2011) Result diversification based on query-specific cluster ranking. J Assoc Inf Sci Technol 62(3):550–571. https://doi.org/10.1002/asi.21468

    Google Scholar 

  18. Ionescu B, Gînscă A, Boteanu B, Popescu A, Lupu M, Müller H (2015) Retrieving diverse social images at mediaeval 2015: Challenge, dataset and evaluation. In: Working Notes Proceedings of the MediaEval 2015 Workshop. Wurzen. http://ceur-ws.org/Vol-1436/Paper2.pdf. Accessed 13 Aug 2021

  19. Ionescu B, Gînsca A, Boteanu B, Popescu A, Lupu M, Müller H (2015) Retrieving diverse social images at mediaeval 2015: Challenge, dataset and evaluation. In: Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany, September 14-15, 2015. http://ceur-ws.org/Vol-1436/Paper2.pdf. Accessed 13 Aug 2021

  20. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666. https://doi.org/10.1016/j.patrec.2009.09.011

    Article  Google Scholar 

  21. Kharazmi S, Sanderson M, Scholer F, Vallet D (2014) Using score differences for search result diversification. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 1́4. https://doi.org/10.1145/2600428.2609530. Association for Computing Machinery, New York, NY, USA, pp 1143–1146

  22. Lewis J, Ossowski S, Hicks JM, Errami M, Garner HR (2006) Text similarity: an alternative way to search MEDLINE. Bioinformatics 22 (18):2298–2304. https://doi.org/10.1093/bioinformatics/btl388

    Article  Google Scholar 

  23. Liang J, Zhao X, Li D, Cao F, Dang C (2012) Determining the number of clusters using information entropy for mixed data. Pattern Recogn 45 (6):2251–2265. https://doi.org/10.1016/j.patcog.2011.12.017. Brain Decoding

    Article  MATH  Google Scholar 

  24. Lux M, Chatzichristofis SA (2008) lire: lucene image retrieval: an extensible java CBIR library. In: Proceedings of the 16th International Conference on Multimedia 2008, Vancouver, British Columbia, Canada, October 26-31, 2008. https://doi.org/10.1145/1459359.1459577, pp 1085–1088

  25. Nisbet R, Elder J, Miner G (2009) Chapter 13 - model evaluation and enhancement. In: R. Nisbet, J. Elder, G. Miner (eds.) Handbook of Statistical Analysis and Data Mining Applications. Academic Press, Boston, pp 285–312

  26. Penatti OAB, Valle E, da Silva Torres R (2012) Comparative study of global color and texture descriptors for web image retrieval. J Vis Commun Image Represent 23(2):359–380. https://doi.org/10.1016/j.jvcir.2011.11.002

    Article  Google Scholar 

  27. Peng L, Bin Y, Fu X, Zhou J, Yang Y, Shen HT (2017) Cfm@mediaeval 2017 retrieving diverse social images task via re-ranking and hierarchical clustering. In: Working Notes Proceedings of the MediaEval 2017 Workshop co-located with the Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 13-15, 2017. http://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_23.pdf. Accessed 13 Aug 2021

  28. Raman K, Shivaswamy P, Joachims T (2012) Online learning to diversify from implicit feedback. In: The 18th ACM international conference on knowledge discovery and data mining, 2012, beijing, china, august 12-16, 2012, pp. 705–713. https://doi.org/10.1145/2339530.2339642

  29. Rao V, Jain P, Jawahar CV (2016) Diverse yet efficient retrieval using locality sensitive hashing. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR 2016, New York, New York, USA, June 6-9, 2016. https://doi.org/10.1145/2911996.2911998, pp 189–196

  30. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072

    Article  Google Scholar 

  31. Rokach L, Maimon O (2005) Clustering Methods. Springer, Boston, pp 321–352

    Google Scholar 

  32. Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7. https://www.sciencedirect.com/science/article/pii/0377042787901257

    Article  MATH  Google Scholar 

  33. Samani ZR, Moghaddam ME (2017) A knowledge-based semantic approach for image collection summarization. Multimed Tools Appl 76(9):11917–11939. https://doi.org/10.1007/s11042-016-3840-1

    Article  Google Scholar 

  34. Santos RLT, Macdonald C, Ounis I (2015) Search result diversification. Found Trends Inf Retr 9(1):1–90. https://doi.org/10.1561/1500000040

    Article  Google Scholar 

  35. Soleymani M, Riegler M, Halvorsen P (2017) Multimodal analysis of image search intent: Intent recognition in image search from user behavior and visual content. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR 2017, Bucharest, Romania, June 6-9, 2017. https://doi.org/10.1145/3078971.3078995, pp 251–259

  36. Spyromitros-Xioufis E, Papadopoulos S, Ginsca AL, Popescu A, Kompatsiaris Y, Vlahavas I (2015) Improving diversity in image search via supervised relevance scoring. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ICMR ’15, Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2671188.2749334, pp 323–330

  37. Tollari S (2016) UPMC at mediaeval 2016 retrieving diverse social images task. In: Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_14.pdf. Accessed 13 Aug 2021

  38. Tripathi S, Bhardwaj AEP (2018) Approaches to clustering in customer segmentation. Int J Eng Technol 7:802. https://doi.org/10.14419/ijet.v7i3.12.16505

    Article  Google Scholar 

  39. Ünlü R, Xanthopoulos P (2019) Estimating the number of clusters in a dataset via consensus clustering. Expert Syst Appl 125:33–39. https://doi.org/10.1016/j.eswa.2019.01.074

    Article  Google Scholar 

  40. Vargas S, Castells P, Vallet D (2012) Explicit relevance models in intent-oriented information retrieval diversification. In: The 35th international ACM conference on research and development in information retrieval, 2012, portland, OR, USA, August 12-16, 2012. https://doi.org/10.1145/2348283.2348297, pp 75–84

  41. Veltkamp RC, Tanase M, Sent D (1999) Features in content-based image retrieval systems: a survey. In: State-of-the-art in content-based image and video retrieval [dagstuhl seminar, 5-10 december 1999]. https://doi.org/10.1007/978-94-015-9664-0_5, pp 97–124

  42. Vieira MR, Razente HL, Barioni MCN, Hadjieleftheriou M, Srivastava D, Traina C, Tsotras VJ (2011) On query result diversification. In: Proceedings of the ieee 27th international conference on data engineering. https://doi.org/10.1109/ICDE.2011.5767846, pp 1163–1174

  43. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Machine Intell 13(8):841–847. https://doi.org/10.1109/34.85677

    Article  Google Scholar 

  44. Xu J, Xia L, Lan Y, Guo J, Cheng X (2017) Directly optimize diversity evaluation measures: A new approach to search result diversification ACM Transactions on Intelligent Systems and Technology 8(3). https://doi.org/10.1145/2983921

  45. Yu H, Liu Z, Wang G (2014) An automatic method to determine the number of clusters using decision-theoretic rough set. Int J Approx Reason 55 (1, Part 2):101–115. https://doi.org/10.1016/j.ijar.2013.03.018. Special issue on Decision-Theoretic Rough Sets

    Article  MathSciNet  MATH  Google Scholar 

  46. Zagoris K, Chatzichristofis S, Papamarkos N, Boutalis SY (2010) Automatic image annotation and retrieval using the joint composite descriptor. In: 14Th panhellenic conference on informatics, 2010, tripoli, greece, september 10-12, 2010. https://doi.org/10.1109/PCI.2010.38, pp 143–147

  47. Zaharieva M (2016) An adaptive clustering approach for the diversification of image retrieval results. In: Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_12.pdf. Accessed 13 Aug 2021

  48. Zhai CX, Cohen WW, Lafferty J (2003) Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 10–17. ACM, New York, NY, USA. https://doi.org/10.1145/860435.860440

  49. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: An efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996. https://doi.org/10.1145/233269.233324, pp 103–114

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Solenir Lima Figuerêdo.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Figuerêdo, J.S.L., Calumby, R.T. Unsupervised query-adaptive implicit subtopic discovery for diverse image retrieval based on intrinsic cluster quality. Multimed Tools Appl 81, 42991–43011 (2022). https://doi.org/10.1007/s11042-022-13050-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13050-4

Keywords

Navigation