Abstract
Given the complex search tasks imposed to social multimedia retrieval systems, the generated similarity-based ranked results often represent redundant item sets, including, e.g., near-duplicates or unrepresentative samples. In this context, several real-world search tasks demand broad coverage of multiple implicit subtopics of a given query in order to properly fulfill the user need. Many works have proposed the use of result diversification for addressing such problem. As a popular approach, the diversification is achieved by grouping similar items obtained from the original ranked list. Hence, a new and diverse ranked list is constructed by iteratively selecting a representative item from each cluster. However, the definition of the number of clusters (subtopics) to be discovered is a long-lasting challenge. Moreover, most clustering optimization approaches for diversification rely on offline training for the selection of a general best configuration used for all queries at run-time. However, this is a complex task given the multiple heterogeneity associated (data, user, query, concepts, etc.) and the consequent impact on the effectiveness of retrieval algorithms. Therefore, such approaches are usually prone to overfit. Hence, in order to attenuate such problems, this work proposes a novel diverse image retrieval approach as an unsupervised query-adaptive subtopic discovery based on intrinsic clustering quality optimization. Our experimental analysis have shown significant improvements in relation to the baseline, both in terms of relevance and diversity.
Similar content being viewed by others
Notes
http://www.flickr.com (As of July 2021)
References
Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. Addison-Wesley Publishing Company, USA
Bholowalia P, Kumar A (2014) Ebk-means: A clustering technique based on elbow method and k-means in wsn. Int J Comput Appl 105(9)
Biasotti S, Cerri A, Giorgi D, Spagnuolo M (2013) PHOG: photometric and geometric functions for textured shape retrieval. Comput Graph Forum 32(5):13–22. https://doi.org/10.1111/cgf.12168
Calumby RT, Gonçalves MA, da Silva Torres R (2017) Diversity-based interactive learning meets multimodality. Neurocomputing 259:159–175. https://doi.org/10.1016/j.neucom.2016.08.129
Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM Conference on Research and Development in Information Retrieval. ACM, New York, NY, USA. https://doi.org/10.1145/290941.291025, pp 335–336
Chang W, Yeh Y, Wang YF (2016) Style-oriented landmark retrieval and summarization. In: Asia-pacific signal and information processing association annual summit and conference, APSIPA 2016, jeju, south korea, december 13-16, 2016. IEEE. https://doi.org/10.1109/APSIPA.2016.7820857, pp 1–4
Chatzichristofis SA, Boutalis YS (2008) CEDD: color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In: Computer Vision systems, 6th international conference, ICVS 2008, Santorini, Greece, May 12-15, 2008, Proceedings. https://doi.org/10.1007/978-3-540-79547-6_30, pp 312–322
Chatzichristofis SA, Boutalis YS (2008) FCTH: fuzzy color and texture histogram - A low level feature for accurate image retrieval. In: Ninth International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2008, Klagenfurt, Austria, May 7-9, 2008. https://doi.org/10.1109/WIAMIS.2008.24, pp 191–196
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227. https://doi.org/10.1109/TPAMI.1979.4766909
Do Carmo Araujo IBA, Calumby RT (2016) Features fusion for diversity gap reduction. In: 31º Simpȯsio brasileiro de banco de dados, 2016, salvador, bahia, brasil, october 4-7, 2016, pp 175–180
Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, USA. https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf. Accessed 13 Aug 2021, pp 226–231
Ferreira CD, Calumby RT, do Carmo Araujo IBA, Dourado ÍC, Muñoz JAV, Penatti OAB, Li LT, Almeida J, da Silva Torres R (2016) Recod @ mediaeval 2016: Diverse social images retrieval. In: Working notes proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_21.pdf. Accessed 13 Aug 2021
Ferreira CD, Calumby RT, do Carmo Araujo IBA, Dourado ÍC, Muñoz JAV, Penatti OAB, Li LT, Almeida J, da Silva Torres R (2016) Recod @ mediaeval 2016: Diverse social images retrieval. In: Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_21.pdf. Accessed 13 Aug 2021
González ÁC, Garcia XB, García-Serrano A, de Ves Cuenca E (2016) UNED-UV@retrieving diverse social images task. In: Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf. Accessed 13 Aug 2021
Han J, Kamber M, Pei J (2012) 10 - cluster analysis: Basic concepts and methods. In: Data mining: concepts and techniques, the morgan kaufmann series in data management systems, third edn. Morgan Kaufmann, Boston, pp 443–495
He J, Meij E, de Rijke M (2011) Result diversification based on query-specific cluster ranking. J Assoc Inf Sci Technol 62(3):550–571. https://doi.org/10.1002/asi.21468
Ionescu B, Gînscă A, Boteanu B, Popescu A, Lupu M, Müller H (2015) Retrieving diverse social images at mediaeval 2015: Challenge, dataset and evaluation. In: Working Notes Proceedings of the MediaEval 2015 Workshop. Wurzen. http://ceur-ws.org/Vol-1436/Paper2.pdf. Accessed 13 Aug 2021
Ionescu B, Gînsca A, Boteanu B, Popescu A, Lupu M, Müller H (2015) Retrieving diverse social images at mediaeval 2015: Challenge, dataset and evaluation. In: Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany, September 14-15, 2015. http://ceur-ws.org/Vol-1436/Paper2.pdf. Accessed 13 Aug 2021
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Kharazmi S, Sanderson M, Scholer F, Vallet D (2014) Using score differences for search result diversification. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 1́4. https://doi.org/10.1145/2600428.2609530. Association for Computing Machinery, New York, NY, USA, pp 1143–1146
Lewis J, Ossowski S, Hicks JM, Errami M, Garner HR (2006) Text similarity: an alternative way to search MEDLINE. Bioinformatics 22 (18):2298–2304. https://doi.org/10.1093/bioinformatics/btl388
Liang J, Zhao X, Li D, Cao F, Dang C (2012) Determining the number of clusters using information entropy for mixed data. Pattern Recogn 45 (6):2251–2265. https://doi.org/10.1016/j.patcog.2011.12.017. Brain Decoding
Lux M, Chatzichristofis SA (2008) lire: lucene image retrieval: an extensible java CBIR library. In: Proceedings of the 16th International Conference on Multimedia 2008, Vancouver, British Columbia, Canada, October 26-31, 2008. https://doi.org/10.1145/1459359.1459577, pp 1085–1088
Nisbet R, Elder J, Miner G (2009) Chapter 13 - model evaluation and enhancement. In: R. Nisbet, J. Elder, G. Miner (eds.) Handbook of Statistical Analysis and Data Mining Applications. Academic Press, Boston, pp 285–312
Penatti OAB, Valle E, da Silva Torres R (2012) Comparative study of global color and texture descriptors for web image retrieval. J Vis Commun Image Represent 23(2):359–380. https://doi.org/10.1016/j.jvcir.2011.11.002
Peng L, Bin Y, Fu X, Zhou J, Yang Y, Shen HT (2017) Cfm@mediaeval 2017 retrieving diverse social images task via re-ranking and hierarchical clustering. In: Working Notes Proceedings of the MediaEval 2017 Workshop co-located with the Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 13-15, 2017. http://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_23.pdf. Accessed 13 Aug 2021
Raman K, Shivaswamy P, Joachims T (2012) Online learning to diversify from implicit feedback. In: The 18th ACM international conference on knowledge discovery and data mining, 2012, beijing, china, august 12-16, 2012, pp. 705–713. https://doi.org/10.1145/2339530.2339642
Rao V, Jain P, Jawahar CV (2016) Diverse yet efficient retrieval using locality sensitive hashing. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR 2016, New York, New York, USA, June 6-9, 2016. https://doi.org/10.1145/2911996.2911998, pp 189–196
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072
Rokach L, Maimon O (2005) Clustering Methods. Springer, Boston, pp 321–352
Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7. https://www.sciencedirect.com/science/article/pii/0377042787901257
Samani ZR, Moghaddam ME (2017) A knowledge-based semantic approach for image collection summarization. Multimed Tools Appl 76(9):11917–11939. https://doi.org/10.1007/s11042-016-3840-1
Santos RLT, Macdonald C, Ounis I (2015) Search result diversification. Found Trends Inf Retr 9(1):1–90. https://doi.org/10.1561/1500000040
Soleymani M, Riegler M, Halvorsen P (2017) Multimodal analysis of image search intent: Intent recognition in image search from user behavior and visual content. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR 2017, Bucharest, Romania, June 6-9, 2017. https://doi.org/10.1145/3078971.3078995, pp 251–259
Spyromitros-Xioufis E, Papadopoulos S, Ginsca AL, Popescu A, Kompatsiaris Y, Vlahavas I (2015) Improving diversity in image search via supervised relevance scoring. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ICMR ’15, Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2671188.2749334, pp 323–330
Tollari S (2016) UPMC at mediaeval 2016 retrieving diverse social images task. In: Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_14.pdf. Accessed 13 Aug 2021
Tripathi S, Bhardwaj AEP (2018) Approaches to clustering in customer segmentation. Int J Eng Technol 7:802. https://doi.org/10.14419/ijet.v7i3.12.16505
Ünlü R, Xanthopoulos P (2019) Estimating the number of clusters in a dataset via consensus clustering. Expert Syst Appl 125:33–39. https://doi.org/10.1016/j.eswa.2019.01.074
Vargas S, Castells P, Vallet D (2012) Explicit relevance models in intent-oriented information retrieval diversification. In: The 35th international ACM conference on research and development in information retrieval, 2012, portland, OR, USA, August 12-16, 2012. https://doi.org/10.1145/2348283.2348297, pp 75–84
Veltkamp RC, Tanase M, Sent D (1999) Features in content-based image retrieval systems: a survey. In: State-of-the-art in content-based image and video retrieval [dagstuhl seminar, 5-10 december 1999]. https://doi.org/10.1007/978-94-015-9664-0_5, pp 97–124
Vieira MR, Razente HL, Barioni MCN, Hadjieleftheriou M, Srivastava D, Traina C, Tsotras VJ (2011) On query result diversification. In: Proceedings of the ieee 27th international conference on data engineering. https://doi.org/10.1109/ICDE.2011.5767846, pp 1163–1174
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Machine Intell 13(8):841–847. https://doi.org/10.1109/34.85677
Xu J, Xia L, Lan Y, Guo J, Cheng X (2017) Directly optimize diversity evaluation measures: A new approach to search result diversification ACM Transactions on Intelligent Systems and Technology 8(3). https://doi.org/10.1145/2983921
Yu H, Liu Z, Wang G (2014) An automatic method to determine the number of clusters using decision-theoretic rough set. Int J Approx Reason 55 (1, Part 2):101–115. https://doi.org/10.1016/j.ijar.2013.03.018. Special issue on Decision-Theoretic Rough Sets
Zagoris K, Chatzichristofis S, Papamarkos N, Boutalis SY (2010) Automatic image annotation and retrieval using the joint composite descriptor. In: 14Th panhellenic conference on informatics, 2010, tripoli, greece, september 10-12, 2010. https://doi.org/10.1109/PCI.2010.38, pp 143–147
Zaharieva M (2016) An adaptive clustering approach for the diversification of image retrieval results. In: Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_12.pdf. Accessed 13 Aug 2021
Zhai CX, Cohen WW, Lafferty J (2003) Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annual International ACM Conference on Research and Development in Information Retrieval, pp. 10–17. ACM, New York, NY, USA. https://doi.org/10.1145/860435.860440
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: An efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996. https://doi.org/10.1145/233269.233324, pp 103–114
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Figuerêdo, J.S.L., Calumby, R.T. Unsupervised query-adaptive implicit subtopic discovery for diverse image retrieval based on intrinsic cluster quality. Multimed Tools Appl 81, 42991–43011 (2022). https://doi.org/10.1007/s11042-022-13050-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13050-4