A novel multimodal clustering framework for images with diverse associated text

  • Chandramani Chaudhary
  • Poonam Goyal
  • Siddhant Tuli
  • Shuchita Banthia
  • Navneet Goyal
  • Yi-Ping Phoebe Chen


With the enormous growth in the number of images on the web, image clustering has become an essential part of any image retrieval system. Since web images are often accompanied by related text or tags, both visual and textual features can be exploited to improve the precision of web image clustering. Existing clustering methods either utilize them separately in a specific order, or use them simultaneously, but independently. In this work, we propose a new framework, Multimodal Hierarchical Clustering for Images (MHCI), which exploits the coexistence of both visual and textual patterns to establish a relationship between them. We propose textual and visual weights to quantify the relationship established between images and their features. The proposed framework can be applied to a wide variety of image datasets with different characteristics, viz., search results with noisy surrounding text, and tagged images. It can also cluster image search queries and their corresponding clicked images. The respective datasets used include image search results, Flicker (NUS-WIDE), and Clickture (Bing query-log). The proposed framework is shown to be versatile on Clickture dataset, which has not been examined by any of the previous approaches. The experimental results show that MHCI significantly improves the quality of image clusters as compared to existing methods.


Bipartite graph Hierarchical agglomerative clustering Query-log Search result clustering Tags and surrounding text 



  1. 1.
    M. A. Abebe, J. Tekli, F. Getahun, G. Tekli, and R. Chbeir (2016) A General Multimedia Representation Space Model toward Event-Based Collective Knowledge Management, In: Proc. CSE/ EUC/ DCABES , pp. 512–521Google Scholar
  2. 2.
    Agrawal R, Wu C, Grosky WI, Fotouhi F (2007) Image clustering using visual and text keywords. Symposium CIRA-IEEE, Jacksonville, pp 49–54Google Scholar
  3. 3.
    An J, Chen YPP, Chen H DDR: An Index Method for Large Time Series Datasets. Inf Syst 30(5):333–348Google Scholar
  4. 4.
    I. Ayoub, K. J. Codoumi, and J Tekli (2016) Personalized Social Image Organization, Visualization, and Querying Tool Using Low- and High-Level Features, In: Proc. CSE/ EUC/ DCABES, pp. 287–294Google Scholar
  5. 5.
    Beeferman D, Berger A (2000) Agglomerative clustering of a search engine query log. In: Proc. SIGKDD-ACM, pp. 407–416Google Scholar
  6. 6.
    Broilo M (2010) A Stochastic Approach to Image Retrieval Using Relevance Feedback and Particle Swarm Optimization. IEEE Trans TMM 12(4):267–277Google Scholar
  7. 7.
    Cai D, He X, Li Z, Ma W, Wen J (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. Proc. Multimedia-ACM, New York, pp 10–16Google Scholar
  8. 8.
    J. Chang, L. Wang, G. Meng, S. Xiang, and C. Pan (2017) Deep Adaptive Image Clustering, In: Proc. ICCVGoogle Scholar
  9. 9.
    Chen Y, Dong M, Wan W (2009) Image co-clustering with multi-modality features and user feedbacks. Proc. Multimedia-ACM, New York, pp 689–692Google Scholar
  10. 10.
    Chen Y, Wang JZ, Krovetz R (2005) CLUE: cluster-based retrieval of images by unsupervised learning. IEEE Trans Image Processing 14(8):1187–1201CrossRefGoogle Scholar
  11. 11.
    Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng YT (2009) Nus-wide: A real-world Web image database from national university of Singapore. In: Proc. CIVR-ACM
  12. 12.
    Cutting DR, Karger DR, Pedersen JO, Tukey JW (1992) Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: SIGIR, pp. 318–329Google Scholar
  13. 13.
    Dubes RC, Jain AK (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River, NJ, USAGoogle Scholar
  14. 14.
    Gao B, Liu T, Qin T, Zhenget X, Cheng Q, Ma W (2005) Web image clustering by consistent utilization of visual features and surrounding texts. Proc. Multimedia-ACM, New York, pp 112–121Google Scholar
  15. 15.
    Goyal P, Mehala N (2011) Concept based query recommendation. Proc. AusDM, BallaratGoogle Scholar
  16. 16.
    Hamzaoui A, Joly A, Boujemaa N (2011) Multi-source shared nearest neighbours for multi-modal image clustering. MTAP Springer US 51(2):479–503Google Scholar
  17. 17.
    Hoi SC, Liu W, Chang S (2008) Semi-supervised distance metric learning for collaborative image retrieval. In: Proc. CVPR-IEEE, pp. 1–7Google Scholar
  18. 18.
    Hu Y, Yu N, Li Z, Li M (2007) Image search result clustering and re-ranking via partial grouping. Proc. ICME-IEEE, Beijing, pp 603–606Google Scholar
  19. 19.
    Hua XS et al (2013) Clickture: A large-scale real-world image dataset. In: Microsoft Research Technical Report MSR-TR-2013-75Google Scholar
  20. 20.
    Jing F, Wang C, Yao Y, Deng K, Zhang L, Ma WC (2006) IGroup: Web image search results clustering. Proc. Multimedia-ACM, New York, pp 587–596Google Scholar
  21. 21.
    Kobayashi M, Kameyama K (2008) User-Adaptive Image Clustering using Relevance Feedback for Efficient Content-Based Retrieval. In: Proc. IEEE SMCGoogle Scholar
  22. 22.
    Krischnamachari S, Abdel-Mottaleb M (1999) Image browsing using hierarchical clustering. In: IEEE symposium on computers and communications, pp. 301–307Google Scholar
  23. 23.
    Larsen B, Aone C (1999) Fast and Effective Text Mining Using Linear-time Document Clusterin. In: KDD, CaliforniaGoogle Scholar
  24. 24.
    Lee KM (2010) Cluster-Driven Refinement for Content-Based Digital Image Retrieval. IEEE Trans TMM 12(6):817–827Google Scholar
  25. 25.
    Leuken RHV, Garcia L, Olivares X, Zwol R (2009) Visual diversification of image search results. Proc WWW-ACM, New York, pp 341–350Google Scholar
  26. 26.
    Li X, Cui G, Dong Y (2016) Graph Regularized Non-Negative Low-Rank Matrix Factorization for Image Clustering. IEEE Trans Cybernetics 99:1–14Google Scholar
  27. 27.
    Li H, He X, Tao D, Tang Y, Wang R (2018) Joint medical image fusion, denoising and enhancement via discriminative low-rank sparse dictionaries learning. Pattern Recogn 79:130–146CrossRefGoogle Scholar
  28. 28.
    Li P, Wang M, Cheng J, Xu C, Lu H (2013) Spectral Hashing With Semantically Consistent Graph for Image Indexing. IEEE Trans TMM 15(1):141–152Google Scholar
  29. 29.
    Liang J, Han Y, Hu Q (2016) Semi-Supervised image clustering with multi-modal information. ACM Multimedia System 22(2):149–160CrossRefGoogle Scholar
  30. 30.
    Liu Q, Sun Y, Wang C, Liu T, Tao D (2017) Elastic Net Hypergraph Learning for Image Clustering and Semi-Supervised Classification. IEEE Trans Image Processing 26(1):452–463MathSciNetCrossRefGoogle Scholar
  31. 31.
    Lowe DG (1999) Object recognition from local scale-invariant features. Proc. Computer Vision-IEEE, Kerkyra, pp 1150–1157Google Scholar
  32. 32.
    Ma H, Zhu J, Lyu MRT, King I (2010) Bridging the Semantic Gap Between Image Contents and Tags. IEEE Trans TMM 12(5):462–473Google Scholar
  33. 33.
    Moëllic PA, Haugeard J, Pitel G (2008) Image clustering based on a shared nearest neighbors approach for tagged collections. Proc. CIVR-ACM, New York, pp 269–278Google Scholar
  34. 34.
    Nahar J, Imam T, Tickle K, Chen YPP (2013) Computational Intelligence for Heart Disease Diagnosis: A Medical Knowledge Driven Approach. Expert Syst Appl 40(1):96–104CrossRefGoogle Scholar
  35. 35.
    Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans TPAMI 24(7):971–987CrossRefGoogle Scholar
  36. 36.
    Pedronette DCG, Torres RDS (2012) Exploiting pairwise recommendation and clustering strategies for image re-ranking. Inf Sci 207:19–34CrossRefGoogle Scholar
  37. 37.
    Picsearch image search. Accessed: May 2015
  38. 38.
    Priyogi B, Selviandro N, Hasibuan ZA, Ahmad M (2014) Image Clustering Using Multi-visual Features. Lecture Notes in Computer Science Information and Communication Technology 8407:179–189Google Scholar
  39. 39.
    Rege M, Dong M, Hua J (2008) Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. Proc. WWW-ACM, New York, pp 317–326Google Scholar
  40. 40.
    Smith JR (2002) Color for image retrieval. In: Image Databases, John Wiley & Sons, Inc., 11, pp. 285–311Google Scholar
  41. 41.
    Tan P-N, Steinbach M, Kumar V (2014) Introduction to Data MiningGoogle Scholar
  42. 42.
    Tang X, Liu K, Cui J, Wen F, Wang X (2012) Intentsearch: Capturing user intention for one-click internet image search. IEEE Trans TPAMI 34(7):1342–1353CrossRefGoogle Scholar
  43. 43.
    Tao D, Cheng J, Yu Z, Yue K, Wang L (2018) Domain-Weighted Majority Voting for Crowdsourcing. IEEE trans Neural Networks and Learning Systems, pp. 1–12Google Scholar
  44. 44.
    Tao D, Guo Y, Li Y, Gao X (2018) Tensor Rank Preserving Discriminant Analysis for Facial Recognition. IEEE Trans Image Processing 27:325–334MathSciNetCrossRefGoogle Scholar
  45. 45.
    Tsai JT, Lin YY, Liao HYM (2014) Per-Cluster Ensemble Kernel Learning for Multi-Modal Image Clustering With Group-Dependent Feature Selection. IEEE Trans TMM 16(8):2229–2241Google Scholar
  46. 46.
    Wang XD, Chen RC, Hong CQ, Zeng ZQ, Zhou ZL (2017) Semi-supervised multi-label feature selection via label correlation analysis with l1-norm graph embedding. Image Vis Comput 63:10–23CrossRefGoogle Scholar
  47. 47.
    Wang XD, Chen RC, Zeng ZQ, Hong CQ, Yan F (2018) Robust Dimension Reduction for Clustering With Local Adaptive Learning. IEEE trans Neural Network Learning SystemsGoogle Scholar
  48. 48.
    Wang X, Zhang X, Zeng Z, Wu Q, Zhang J (2016) Unsupervised spectral feature selection with l1-norm graph. Neurocomputing 200:47–54CrossRefGoogle Scholar
  49. 49.
    Wu F, Pai HT, Yan YF, Chuang J (2014) Clustering results of image searches by annotations and visual features. Telematics Inform 31(3):477–491CrossRefGoogle Scholar
  50. 50.
    Xia DS, Xiang ZQ, Zou YX (2015) Integrating visual and textual features for web image clustering, vol 2015. Proc. BigMM-IEEE, Beijing, pp 116–123Google Scholar
  51. 51.
    Yan Y, Liu G, Wang S, Zhang J, Zheng K (2017) Graph-based clustering and ranking for diversified image search. ACM Multimedia Systems 23(1):41–52CrossRefGoogle Scholar
  52. 52.
    Yang Y, Yang L, Wu G, Li S (2014) Image Relevance Prediction Using Query-Context Bag-of-Object Retrieval Model. IEEE Trans TMM 16(6):1700–1712Google Scholar
  53. 53.
    Yu J, Rui Y, Chen B (2014) Exploiting Click Constraints and Multi-view Features for Image Re-ranking. IEEE Trans TMM 16(1):159–168Google Scholar
  54. 54.
    Zhao R (2002) Narrowing the Semantic Gap—Improved Text-Based Web Document Retrieval Using Visual Features. IEEE Trans TMM 4(2):189–200Google Scholar
  55. 55.
    Zhao K, Cai Z, Sui Q, Wei E, Zh KQ (2014) Clustering image search results by entity disambiguation. Lecture Notes in Computer Science Machine Learning and Knowledge Discovery in Databases 8726:369–384Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and Information SystemsBITS PilaniPilaniIndia
  2. 2.Department of Computer Science and Information TechnologyLa Trobe UniversityMelbourneAustralia

Personalised recommendations