Online Multimodal Co-indexing and Retrieval of Social Media Data

  • Lei MengEmail author
  • Ah-Hwee Tan
  • Donald C. Wunsch II
Part of the Advanced Information and Knowledge Processing book series (AI&KP)


Effective indexing of social media data is key to searching for information on the social Web. However, the characteristics of social media data make it a challenging task. The large-scale and streaming nature is the first challenge, which requires the indexing algorithm to be able to efficiently update the indexing structure when receiving data streams. The second challenge is utilizing the rich meta-information of social media data for a better evaluation of the similarity between data objects and for a more semantically meaningful indexing of the data, which may allow the users to search for them using the different types of queries they like. Existing approaches based on either matrix operations or hashing usually cannot perform an online update of the indexing base to encode upcoming data streams, and they have difficulty handling noisy data. This chapter presents a study on using the Online Multimodal Co-indexing Adaptive Resonance Theory (OMC-ART) for an effective and efficient indexing and retrieval of social media data. More specifically, two types of social media data are considered: (1) the weakly supervised image data, which is associated with captions, tags and descriptions given by the users; and (2) the e-commerce product data, which includes product images, titles, descriptions and user comments. These scenarios make this study related to multimodal web image indexing and retrieval. Compared with existing studies, OMC-ART has several distinct characteristics. First, OMC-ART is able to perform online learning of sequential data. Second, instead of a plain indexing structure, OMC-ART builds a two-layer one, in which the first layer co-indexes the images by the key visual and textual features based on the generalized distributions of the clusters they belong to; while in the second layer, the data objects are co-indexed by their own feature distributions. Third, OMC-ART enables flexible multimodal searching by using either visual features, keywords, or a combination of both. Fourth, OMC-ART employs a ranking algorithm that does not need to go through the whole indexing system when only a limited number of images need to be retrieved. Experiments on two publicly accessible image datasets and a real-world e-commerce dataset demonstrate the efficiency and effectiveness of OMC-ART. The content of this chapter is summarized and extended from [13] (, and the Python codes of OMC-ART with examples on building an e-commerce product search engine are available at


  1. 1.
    Caicedo JC, Moreno JG, Niño EA, González FA (2010) Combining visual features and text data for medical image retrieval using latent semantic kernels. In: Proceedings of the international conference on multimedia information retrieval, pp 359–366Google Scholar
  2. 2.
    Caicedo JC, BenAbdallah J, González FA, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60CrossRefGoogle Scholar
  3. 3.
    Chandrika P, Jawahar CV (2010) Multi modal semantic indexing for image retrieval. In: Proceedings of the international conference on image and video retrieval, pp 342–349Google Scholar
  4. 4.
    Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: CIVR, pp 1–9Google Scholar
  5. 5.
    Duygulu P, Barnard K, de Freitas JF, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV, pp 97–112Google Scholar
  6. 6.
    Escalante HJ, Montes M, Sucar E (2012) Multimodal indexing based on semantic cohesion for image retrieval. Inf Retr 15(1):1–32CrossRefGoogle Scholar
  7. 7.
    Gong Y, Wang L, Hodosh M, Hockenmaier J, Lazebnik S (2014) Improving image-sentence embeddings using large weakly annotated photo collections. In: Proceedings of the European conference on computer vision (ECCV), pp 529–545Google Scholar
  8. 8.
    Gonzalez F, Caicedo J (2010) NMF-based multimodal image indexing for querying by visual example. In: Proceedings of the international conference on image and video retrieval, pp 366–373Google Scholar
  9. 9.
    Li M, Xue XB, Zhou ZH (2009) Exploiting multi-modal interactions: a unified framework. In: IJCAI, pp 1120–1125Google Scholar
  10. 10.
    Lienhart R, Romberg S, Hörster E (2009) Multilayer pLSA for multimodal image retrieval. In: Proceedings of the ACM international conference on image and video retrievalGoogle Scholar
  11. 11.
    Mei T, Rui Y, Li S, Tian Q (2014) Multimedia search reranking: a literature survey. ACM Comput Surv (CSUR) 46(3):38CrossRefGoogle Scholar
  12. 12.
    Meng L, Tan AH, Xu D (2014) Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Trans Knowl Data Eng 26(9):2293–2306CrossRefGoogle Scholar
  13. 13.
    Meng L, Tan AH, Leung C, Nie L, Chua TS, Miao C (2015) Online multimodal co-indexing and retrieval of weakly labeled web image collections. In: Proceedings of the 5th ACM on international conference on multimedia retrieval. ACM, pp 219–226.
  14. 14.
    Mu Y, Shen J, Yan S (2010) Weakly-supervised hashing in kernel space. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3344–3351Google Scholar
  15. 15.
    Nie L, Wang M, Zha ZJ, Li G, Chua TS (2011) Multimedia answering: enriching text QA with media information. In: SIGIR, pp 695–704Google Scholar
  16. 16.
    Nie L, Wang M, Gao Y, Zha ZJ, Chua TS (2013) Beyond text QA: multimedia answer generation by harvesting web information. IEEE Trans Multimed 15(2):426–441CrossRefGoogle Scholar
  17. 17.
    Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRefGoogle Scholar
  18. 18.
    Su JH, Wang BW, Hsu TY, Chou CL, Tseng VS (2010) Multi-modal image retrieval by integrating web image annotation, concept matching and fuzzy ranking techniques. Int J Fuzzy Syst 12(2):136–149Google Scholar
  19. 19.
    Yu FX, Ji R, Tsai MH, Ye G, Chang SF (2012) Weak attributes for large-scale image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2949–2956Google Scholar
  20. 20.
    Zhang S, Yang M, Wang X, Lin Y, Tian Q (2013) Semantic-aware co-indexing for image retrieval. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1673–1680Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.NTU-UBC Research Center of Excellence in Active Living for the Elderly (LILY)Nanyang Technological UniversitySingaporeSingapore
  2. 2.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore
  3. 3.Applied Computational Intelligence LaboratoryMissouri University of Science and TechnologyRollaUSA

Personalised recommendations