Skip to main content
Log in

Visual understanding by mining social media: recent advances and challenges

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

With the rapid increase in social websites that has dramatically increased the volume of social media, which includes the use of images and videos, visual understanding has attracted great interest in several areas such as multimedia, computer vision, and pattern recognition. Valuable auxiliary resources available on social websites, such as user-provided tags, aid in the tasks of visual understanding. Therefore, several methods have been proposed for exploring the auxiliary resources for tag refinement, image retrieval, and media summarization. This work conducts a comprehensive survey of recent advances in visual understanding by mining social media in order to discuss their merits and limitations. We then analyze the difficulties and challenges of visual understanding followed by several possible future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chua T S, Tang J H, Hong R C, Li H J, Luo Z P, Zheng Y T. NUSWIDE: A real-world web image database from national university of singapore. In: Proceedings of ACM International Conference on Image and Video Retrieval. 2009

    Google Scholar 

  2. Liu D, Yan S C, Hua X S, Zhang H J. Image retagging using collaborative tag propagation. IEEE Transactions on Multimedia, 2011, 13(4): 702–712

    Article  Google Scholar 

  3. Li Z C, Liu J, Tang J H, Lu H Q. Projective matrix factorization with unified embedding for social image tagging. Computer Vision and Image Understanding, 2014, 124: 71–78

    Article  Google Scholar 

  4. Liu Q L, Li Z C. Projective nonnegative matrix factorization for social image retrieval. Neurocomputing, 2016, 172: 19–26

    Article  Google Scholar 

  5. Smeulders A W M, Worring M, Santini S, Gupta A, Jain R. Contentbased image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12): 1349–1380

    Article  Google Scholar 

  6. Datta R, Joshi D, Li J, Wang J Z. Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2): 5

    Article  Google Scholar 

  7. Wang M, Ni B B, Hua X S, Chua T S. Assistive tagging: a survey of multimedia tagging with human-computer joint exploration. ACM Computing Surveys, 2012, 44(4): 25

    Article  Google Scholar 

  8. Mei T, Rui Y, Li S P, Tian Q. Multimedia search reranking: a literature survey. ACM Computing Surveys, 2014, 46(3): 38:1–38:36

    Article  Google Scholar 

  9. Qi G J, Aggarwal C, Tian Q, Ji H, Huang T. Exploring context and content links in social media: a latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(5): 850–862

    Article  Google Scholar 

  10. Ma Z G, Nie F P, Yang Y, Uijlings J R, Sebe N. Web image annotation via subspace-sparsity collaborated feature selection. IEEE Transactions on Multimedia, 2012, 14(4): 1021–1030

    Article  Google Scholar 

  11. Gong Y C, Ke Q F, Isard M, Lazebnik S. A multi-view embedding space for modeling internet images, tags, and their semantics. International Journal of Computer Vision, 2013, 106(2): 210–233

    Article  Google Scholar 

  12. Kang C C, Xiang S M, Liao S C, Xu C S, Pan C H. Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Transactions on Multimedia, 2015, 17(3): 370–381

    Article  Google Scholar 

  13. Li K, Yang J Y, Jiang J M. Nonrigid structure from motion via sparse representation. IEEE Transactions on Cybernetics, 2015, 45(8): 1401–1413

    Article  Google Scholar 

  14. Li Z C, Tang J H, He X F. Robust structured nonnegative matrix factorization for image representation. IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2017.2691725

  15. Huiskes M, Lew M. The MIR flickr retrieval evaluation. In: Proceedings of ACM International Conference on Multimedia Information Retrieval. 2008, 39–43

    Google Scholar 

  16. Tang J H, Shu X B, Li Z C, Qi G J, Wang J D. Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 2016, 12(4s): 68

    Google Scholar 

  17. Hua X S, Yang L J, Wang J D, Wang J, Ye M, Wang K, Rui Y, Li J. Clickture: a large-scale real-world image dataset. Mocrosoft Research Technical Report MSR-TR-2013-75. 2013

    Google Scholar 

  18. Huiskes M, Thomee B, Lew M. New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: Proceedings of ACM International Conference on Multimedia Information Retrieval. 2010, 527–536

    Google Scholar 

  19. Hua X S, Yang L J, Wang J D, Wang J, Ye M, Wang K, Rui Y, Li J. Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: Proceedings of the 21st ACM International Conference on Multimedia. 2013, 243–252

    Chapter  Google Scholar 

  20. Sivic J, Zisserman A. Video Google: a text retrieval approach to object matching in videos. In: Proceedings of European Conference on Computer Vision. 2003

    Google Scholar 

  21. Li Z C, Yang Y, Liu J, Zhou X F, Lu H Q. Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings of National Conference on Artificial Intelligence. 2012, 1026–1032

    Google Scholar 

  22. Yang Y, Ma Z G, Hauptmann A G, Sebe N. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia, 2013, 15(3): 661–669

    Article  Google Scholar 

  23. Li Z C, Liu J, Yang Y, Zhou X F, Lu H Q. Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Transactions on Knowledge and Data Engineering, 2014, 9(26): 2138–2150

    Google Scholar 

  24. Tang J L, Liu H. An unsupervised feature selection framework for social media data. IEEE Transactions on Knowledge and Data Engineering, 2014, 12(26): 2914–2927

    Article  Google Scholar 

  25. Hong R C, Wang M, Gao Y, Tao D C, Li X L, Wu X D. Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE Transactions on Cybernetics, 2014, 44(5): 669–680

    Article  Google Scholar 

  26. Li Z C, Tang J H. Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Transactions on Image Processing, 2015, 12(24): 5343–5355

    Article  MathSciNet  Google Scholar 

  27. Shi C J, Ruan Q Q, Guo S, Tian Y. Sparse feature selection based on l2,1/2-matrix norm for web image annotation. Neurocomputing, 2015, 151: 424–433

    Article  Google Scholar 

  28. Chandrilka P, Jawahar C V. Multi modal semantic indexing for image retrieval. In: Proceedings of ACM International Conference on Image and Video Retrieval. 2010, 342–349

    Chapter  Google Scholar 

  29. Rasiwasia N, Pereira J C, Coviello E, Doyle G, Lanckriet G R, Levy R, Vasconcelos N. A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia. 2010, 251–260

    Google Scholar 

  30. Hwang S J, Grauman K. Learning the relative importance of objects from tagged images for retrieval and cross-model search. International Journal of Computer Vision, 2012, 100(2): 134–153

    Article  MathSciNet  Google Scholar 

  31. Li Z C, Liu J, Lu H Q. Structure preserving non-negative matrix factorization for dimensionality reduction. Computer Vision and Image Understanding, 2013, 9(117): 1175–1189

    Article  Google Scholar 

  32. Li Z C, Liu J, Lu H Q. Sparse constraint nearest neighbor selection in cross-media retrieval. In: Proceedings of the 17th IEEE International Conference on Image Processing. 2010, 1465–1468

    Google Scholar 

  33. Liu X C, Song X N, Jiang J M. The extraction of powerful and attractive video contents based on one class SVM. In: Proceedings of Pacific Rim Conference on Multimedia. 2015, 375–382

    Google Scholar 

  34. Yan Y, Xu Z W, Liu G W, Ma Z G, Sebe N. Glocal structural feature selection with sparsity for multimedia data understanding. In: Proceedings of the 21st ACM International Conference on Multimedia. 2013, 537–540

    Chapter  Google Scholar 

  35. Chartrand R. Exact reconstructions of sparse signals via nonconvex minimization. IEEE Signal Process Letters, 2007, 14(10): 707–710

    Article  Google Scholar 

  36. Chen X J, Xu F M, Ye Y Y. Lower bound theory of nonzero entries in solutions of ℓ2-ℓp minimization. SIAM Journal on Scientific Computing, 2010, 32(5): 2832–2852

    Article  MathSciNet  MATH  Google Scholar 

  37. Song X N, Zhang J G, Han Y H, Jiang J M. Semi-supervised feature selection via hierarchical regression for Web image classification. Multimedia Systems, 2016, 22: 41–49

    Article  Google Scholar 

  38. Wang J J, Gong Y H. Discovering image semantics in codebook derivative space. IEEE Transactions on Multimedia, 2012, 14(4): 986–994

    Article  Google Scholar 

  39. Kuo Y H, Cheng W H, Lin H T, Hsu W H. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Transactions on Multimedia, 2012, 14(4): 1079–1090

    Article  Google Scholar 

  40. Lu Z W, Peng Y X. Image annotation by semantic sparse recoding of visual content. In: Proceedings of the 20th ACM International Conference on Multimedia. 2012, 499–508

    Chapter  Google Scholar 

  41. Lu Z W, Peng Y X. Learning descriptive visual representation by semantic regularized matrix factorization. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1523–1529

    Google Scholar 

  42. Lu Z W, Wang L W, Wen J R. Direct semantic analysis for social image classification. In: Proceedings of AAAI Conference on Artificial Intelligence. 2014, 1258–1264

    Google Scholar 

  43. Ballan L, Uricchio T, Seidenari L, Bimbo A D. A cross-media model for automatic image annotation. In: Proceedings of ACM International Conference on Multimedia Retrieval. 2014

    Google Scholar 

  44. Tao L, Ip H, Wang Y L, Shu X. Exploring shared subspace and joint sparsity for canonical correlation analysis. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014, 1887–1890

    Google Scholar 

  45. Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 2001, 42(1-2): 177–196

    Article  MATH  Google Scholar 

  46. Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993–1022

    MATH  Google Scholar 

  47. Sun L, Ji S W, Ye J P. Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 194–200

    Article  Google Scholar 

  48. Sharma A, Kumar A III H D, Jacobs D W. Generalized multiview analysis: a discriminative latent space. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2160–2167

    Google Scholar 

  49. Murthy V N,Maji S, Manmatha R. Automatic image annotation using deep learning representations. In: Proceedings of ACM Int’l Conf. on Multimedia Retrieval. 2015, 603–606

    Chapter  Google Scholar 

  50. Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems Conference. 2012, 1097–1105

    Google Scholar 

  51. Andrew G, Arora R, Bilmes J, Livescu K. Deep canonical correlation analysis. In: Proceedings of International Conference on Machine Learning. 2013, 1247–1255

    Google Scholar 

  52. Frome A, Corrado G, Shlens J, Bengio S, Dean J, Mikolov T. Devise: A deep visual-semantic embedding model. In: Proceedings of the Neural Information Processing Systems Conference. 2013, 2121–2129

    Google Scholar 

  53. Liu Y, Shi Z C, Li X, Wang G. Click-through-based deep visualsemantic embedding for image search. In: Proceedings of the 23rd ACM International Conference on Multimedia. 2015, 955–958

    Chapter  Google Scholar 

  54. Li Z C, Liu J, Tang J H, Lu H Q. Robust structured subspace learning for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(10): 2085–2098

    Article  Google Scholar 

  55. Tang J H, Zha Z J, Tao D C, Chua T S. Semantic-gap-oriented active learning for multilabel image annotation. IEEE Transactions on Image Processing, 2012, 21(4): 2354–2360

    Article  MathSciNet  MATH  Google Scholar 

  56. Li Z C, Liu J, Xu C S, Lu H Q. Mlrank: Multi-correlation learning to rank for image annotation. Pattern Recognition, 2013, 46(10): 2700–2710

    Article  MATH  Google Scholar 

  57. Zhang J G, Han Y H, Jiang J M. Tensor rank selection for multimedia analysis. Journal of Visual Communication and Image Representation, 2015, 30: 376–392

    Article  Google Scholar 

  58. Tang J H, Shu X B, Qi Q J, Li Z C, Wang M, Yan S C, Jain R. Triclustered tensor completion for social-aware image tag refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1662–1674

    Article  Google Scholar 

  59. Barnard K, Duygulu P, Forsyth D, Freitas N D, Blei D M, Jordan M I. Matching words and pictures. Journal of Machine Learning Research, 2003, 3: 1107–1135

    MATH  Google Scholar 

  60. Tang J H, Yan S C, Hong R C, Qi G J, Chua T S. Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th International Conference on Multimedia. 2009, 223–232

    Google Scholar 

  61. Liu D, Hua X S, Yang L J, Wang M, Zhang H J. Tag ranking. In: Proceedings of the 18th ACM International Conference on World Wide Web. 2009, 351–360

    Chapter  Google Scholar 

  62. Liu D, Hua X S, Wang M, Zhang H J. Tag retagging. In: Proceedings of ACM Conference on Multimedia. 2010

    Google Scholar 

  63. Liu D, Yan S C, Rui Y, Zhang H J. Unified tag analysis with multiedge graph. In: Proceedings of the 18th ACM International Conference on Multimedia. 2010, 25–34

    Google Scholar 

  64. Tang J H, Hong R C, Yan S C, Chua T S, Qi G J, Jain R. Image annotation by knn-sparse graph-based label propagation over noisily tagged web images. ACM Transactions on Intelligent Systems and Technology, 2011, 2(2): 14: 1–15

    Article  Google Scholar 

  65. Zhuang J F, Hoi S C. A two-view learning approach for image tag ranking. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 625–634

    Google Scholar 

  66. Zhang X M, Zhao X J, Li Z J, Xia J L, Jain R, Chao W H. Social image tagging using graph-based reinforcement on multi-type interrelated objects. Signal Processing, 2013, 93(8): 2178–2189

    Article  Google Scholar 

  67. Zhu X F, Nejdl W, Georgescu M. An adaptive teleportation random walk model for learning social tag relevance. In: Proceedings of the 37th ACM SIGIR International Conference on Research and Development in Information Retrieval. 2014, 223–232

    Google Scholar 

  68. Li Z C, Liu J, Zhu X B, Liu T L, Lu H Q. Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the 18th ACM International Conference on Multimedia. 2010, 1187–1190

    Google Scholar 

  69. Zhu G Y, Yan S C, Ma Y. Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of the 18th ACM International Conference on Multimedia. 2010, 461–470

    Google Scholar 

  70. Feng Z Y, Feng S H, Jin R, Jain A K. Image tag completion by noisy matrix recovery. In: Proceedings of European Conference on Computer Vision, Part I. 2014, 424–438

    Google Scholar 

  71. Yang Y, Gao Y, Zhang H W, Shao J, Chua T S. Image tagging with social assistance. In: Proceedings of ACM International Conference on Multimedia Retrieval. 2014

    Google Scholar 

  72. Liu J, Zhang Y F, Li Z C, Lu H Q. Correlation consistency constrained probabilistic matrix factorization for social tag refinement. Neurocomputing, 2013, 119: 3–9

    Article  Google Scholar 

  73. Li Z C, Liu J, Lu H Q. Nonlinear matrix factorization with unified embedding for social tag relevance learning. Neurocomputing, 2013, 105: 38–44

    Article  Google Scholar 

  74. Li X, Shen B, Liu B D, Zhang Y J. A locality sensitive low-rank model for image tag completion. IEEE Transactions on Multimedia, 2016, 18(3): 474–483

    Article  Google Scholar 

  75. Li Z C, Tang J H. Weakly-supervised deep matrix factorization for social image understanding. IEEE Transactions on Image Processing (TIP), 2017, 26(1): 276–288

    Article  MathSciNet  Google Scholar 

  76. Li Z C, Tang J H. Weakly-supervised deep nonnegative low-rank model for social image tag refinement and assignment. In: Proceedings of AAAI Conference on Artificial Intelligence. 2017

    Google Scholar 

  77. Sang J T, Xu C S, Liu J. User-aware image tag refinement via ternary semantic analysis. IEEE Transactions on Multimedia, 2012, 14(3): 883–895

    Article  Google Scholar 

  78. Qian Z M, Zhong P, Wang R S. Tag refinement for user-contributed images via graph learning and nonnegative tensor factorization. IEEE Signal Processing Letters, 2015, 22(9): 1302–1305

    Article  Google Scholar 

  79. Wang J D, Zhou J Z, Xu H, Mei T, Hua X S, Li S P. Image tag refinement by regularized latent dirichlet allocation. Computer Vision and Image Understanding, 2014, 124: 61–70

    Article  Google Scholar 

  80. Niu Z X, Hua G, Gao X B, Tian Q. Semi-supervised relational topic model for weakly annotated image recognition in social media. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 4233–4240

    Google Scholar 

  81. Lin J, Yuan J S, Duan L Y, Luo S W, Gao W. Social image tagging by mining sparse tag patterns from auxiliary data. In: Proceedings of IEEE International Conference on Multimedia and Expo. 2012, 7–12

    Google Scholar 

  82. Lin Z J, Ding G G, Hu M Q, Wang J M, Ye X J. Image tag completion via image-specific and tag-specific linear sparse reconstructions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 1618–1625

    Google Scholar 

  83. Qian X M, Hua X S, Tang Y Y, Mei T. Social image tagging with diverse semantics. IEEE Transactions on Cybernetics, 2014, 44(12): 2493–2508

    Article  Google Scholar 

  84. Wu L, Jin R, Jain A K. Tag completion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(3): 716–727

    Article  Google Scholar 

  85. Wu L, Yang L J, Yu N H, Hua X S. Learning to tag. In: Proceedings of the 18th International Conference on World Wide Web. 2009

    Google Scholar 

  86. Sun A X, Bhowmick S S, Chong J A. Social image tag recommendation by concept matching. In: Proceedings of the 19th ACM International Conference on Multimedia. 2011, 1181–1184

    Chapter  Google Scholar 

  87. Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of ACM Conference on Recommender Systems. 2008

    Google Scholar 

  88. Li X R, Gavves E, Snoek C G M, Worring M, Smeulders A W. Personalizing automated image annotation using cross-entropy. In: Proceedings of the 19th ACM International Conference on Multimedia. 2011, 233–242

    Chapter  Google Scholar 

  89. Liu J, Li Z C, Tang J H, Jiang Y, Lu H Q. Personalized geo-specific tag recommendation for photos on social websites. IEEE Transactions on Multimedia, 2014, 16(3): 588–600

    Article  Google Scholar 

  90. Rafailidis D, Axenopoulos A, Etzold J, Manolopoulou S, Daras P. Content-based tag propagation and tensor factorization for personalized item recommendation based on social tagging. ACM Transactions on Interactive Intelligent Systems, 2014, 3(4): 26: 1–27

    Article  Google Scholar 

  91. Li X R, Snoek C G M, Worring M. Learning tag relevance by neighbor voting for social image retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. 2008, 180–187

    Google Scholar 

  92. Liu D, Hua X S, Wang M, Zhang H J. Boost search relevance for tagbased social image retrieval. In: Proceedings of IEEE International Conference on Multimedia and Expo. 2009, 1636–1639

    Google Scholar 

  93. Gao Y, Wang M, Zha Z J, Shen J L, Li X L, Wu X D. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing, 2013, 22(1): 363–376

    Article  MathSciNet  MATH  Google Scholar 

  94. Sang J T, Xu C S, Lu D Y. Learn to personalized image search from the photo sharing websites. IEEE Transactions on Multimedia, 2012, 14(4): 963–974

    Article  Google Scholar 

  95. Wang M, Wang K Y, Hua X S, Zhang H J. Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia, 2010, 12(8): 829–842

    Article  Google Scholar 

  96. Rudinac S, Hanjalic A, Larson M. Finding representative and diverse community contributed images to create visual summaries of geographic areas. In: Proceedings of the 19th ACM International Conference on Multimedia. 2011, 1109–1112

    Chapter  Google Scholar 

  97. Jia Y Q, Salzmann M, Darrell T. Learning cross-modality similarity for multinomial data. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 2407–2414

    Google Scholar 

  98. Pan Y W, Yao T, Mei T, Li H Q, Ngo C W, Rui Y. Click-throughbased cross-view learning for image search. In: Proceedings of the 37th ACM SIGIR International Conference on Research and Development in Information Retrieval. 2014

    Google Scholar 

  99. Feng F X, Wang X J, Li R F. Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM International Conference on Multimedia. 2014

    Google Scholar 

  100. Wang W, Yang X Y, Ooi B C, Zhang D X, Zhuang Y T. Effective deep learning-based multi-modal retrieval. The VLDB Journal, 2016, 25: 79–101

    Article  Google Scholar 

  101. Wei Y C, Zhao Y, Lu C Y, Wei S K, Liu L Q, Zhu Z F, Yan S C. Cross-modal retrieval with cnn visual features: a new baseline. IEEE Transactions on Cybernetics, 2017, 47(2): 449–460

    Google Scholar 

  102. Wu L, Hoi S C, Jin R, Zhu J K, Yu N H. Distance metric learning from uncertain side information with application to automated photo tagging. In: Proceedings of the 17th ACM International Conference on Multimedia. 2009

    Google Scholar 

  103. Wu P C, Hoi S C, Zhao P L, He Y. Mining social images with distance metric learning for automated image tagging. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 197–206

    Google Scholar 

  104. Li Z C, Liu J, Jiang Y, Tang J H, Lu H Q. Low rank metric learning for social image retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia. 2012, 853–856

    Chapter  Google Scholar 

  105. Liu S W, Cui P, Zhu W W, Yang S Q, Tian Q. Social embedding image distance learning. In: Proceedings of the 22nd ACM International Conference on Multimedia. 2014, 617–626

    Google Scholar 

  106. Xia H, Wu P C, Hoi S C. Online multi-modal distance learning for scalable multimedia retrieval. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 2013, 455–464

    Google Scholar 

  107. Gao X Y, Hoi S C, Zhang Y D, Wan J, Li J T. SOML: Sparse online metric learning with application to image retrieval. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 1206–1212

    Google Scholar 

  108. Wu P C, Hoi S C, Zhao P L, Miao C Y, Liu Z Y. Online multi-modal distance metric learning with application to image retrieval. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(2): 454–467

    Article  Google Scholar 

  109. Li Z C, Tang J H. Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Transactions on Multimedia, 2015, 17(11): 1989–1999

    Article  Google Scholar 

  110. Wu P C, Hoi S C, Xia H, Zhao P L, Wang D Y, Miao C Y. Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia. 2013, 153–162

    Chapter  Google Scholar 

  111. Zhuang Y T, Liu Y, Wu F, Zhang Y, Shao J. Hypergraph spectral hashing for similarity search of social image. In: Proceedings of the 19th ACM International Conference on Multimedia. 2011, 1457–1460

    Chapter  Google Scholar 

  112. Li P, Wang M, Cheng J, Xu C S, Lu H Q. Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia, 2013, 15(1): 141–152

    Article  Google Scholar 

  113. Cheng J, Leng C, Li P, Wang M, Lu H Q. Semi-supervised multigraph hashing for scalable similarity search. Computer Vision and Image Understanding, 2014, 124: 12–21

    Article  Google Scholar 

  114. Tang J H, Li Z C, Zhang L Y, Huang Q M. Semantic-aware hashing for social image retrieval. In: Proceedings of the 5th ACM International Conference on Multimedia Retrieval. 2015, 483–486

    Chapter  Google Scholar 

  115. Tang J H, Li Z C, Wang M, Zhao R Z. Neighborhood discriminant hashing for large-scale image retrieval. IEEE Transactions on Image Processing, 2015, 24(9): 2827–2840

    Article  MathSciNet  Google Scholar 

  116. Lin J, Li Z C, Tang J H. Discriminative deep hashing for scalable face image retrieval. In: Proceedings of International Joint Conference on Artificial Intelligence. 2017

    Google Scholar 

  117. Tang J H, Li Z C, Zhu X. Supervised deep hashing for scalable face image retrieval. Pattern Recognition, 2017, doi: org/10.1016/j.patcog.2017.03.028

    Google Scholar 

  118. Tang J H, Li Z C. Weakly-supervised multimodal hashing for scalable social image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 2017, doi: 10.1109/TCSVT.2017.2715227

    Google Scholar 

  119. Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T. How flickr helps us make sense of the world: context and content in communitycontributed media collections. In: Proceedings of the 15th ACM International Conference on Multimedia. 2007, 631–640

    Google Scholar 

  120. Hays J, Efros A A. IM2GPS: estimating geographic information from a single image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8

    Google Scholar 

  121. Yang J C, Luo J B, Yu J, Huang T. Photo stream alignment and summarization for collaborative photo collection and sharing. IEEE Transactions on Multimedia, 2012, 14(9): 1642–1651

    Article  Google Scholar 

  122. Li Z C, Tang J H, Wang X M, Liu J, Lu H Q. Multimedia news summarization in search. ACM Transactions on Intelligent Systems and Technology, 2016, 7(3): 33:1–33:20

    Google Scholar 

  123. Liu Y M, Xu D, Tsang I W, Luo J B. Using large-scale web data to facilitate textual query based retrieval of consumer photos. In: Proceedings of the 17th ACM International Conference on Multimedia. 2009, 55–64

    Google Scholar 

  124. Xu Y M L D, Tsang I W, Luo J B. Textual query of personal photos facilitated by large-scale web data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5): 1022–1036

    Article  Google Scholar 

  125. Stefanie N, Ronny P, Uwe K. Photo summary: automated selection of representative photos from a digital collection. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval. 2011, 75:1–75:2

    Google Scholar 

  126. Hua X S, Lu L, Zhang H J. Optimization-based automated home video editing system. IEEE Transactions on Circuit and System for Video Technology, 2004, 14: 572–583

    Article  Google Scholar 

  127. Ma Y F, Hua X S, Lu L, Zhang H J. A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 2005, 7(5): 907–919

    Article  Google Scholar 

  128. Andaloussi S J, Mohamed A, Madrane N, Sekkaki A. Soccer video summarization using video content analysis and social media streams. In: Proceedings of IEEE/ACM International Symposium on Big Data Computing. 2014, 1–7

    Google Scholar 

  129. Khosla A, Hamid R, Lin C J, Sundaresan N. Large-scale video summarization using web-image priors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2013, 2698–2705

    Google Scholar 

  130. Xu C S, Zhang Y F, Zhu G Y, Rui Y, Lu H Q, Huang Q M. Using webcast text for semantic event detection in broadcast sports video. IEEE Transactions on Multimedia, 2008, 10: 1342–1355

    Article  Google Scholar 

  131. Hong R C, Tang J H, Tan H K, Ngo C W, Yan S C, Chua T S. Beyond search: event-driven summarization for web videos. ACM Transactions on Multimedia Computing Communications, and Applications, 2011, 7(4): 35

    Article  Google Scholar 

  132. Wan J, Wang D Y, Hoi S C, Wu P C, Zhu J K, Zhang Y D, Li J T. Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia. 2014, 157–166

    Google Scholar 

  133. Li G, Ma S B, Han Y H. Summarization-based video caption via deep neural networks. In: Proceedings of the 23rd ACM International Conference on Multimedia. 2015, 1191–1194

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Basic Research Program of China (973 Program) (2014CB347600), the National Natural Science Foundation of China (Grant Nos. 61522203 and U1611461), the Natural Science Foundation of Jiangsu Province (BK20140058), and the National Ten Thousand Talent Program of China (Young Top-Notch Talent).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinhui Tang.

Additional information

Xueming Wang received the BS degree in communication engineering from Qingdao Technological University, China in 2009. Now he is a PhD candidate student in computer science and technology at Nanjing University of Science and Technology, China. In 2012, he was a visiting student in Institute of Computing Technology, Chinese Academy of Sciences, China. He has been an intern student in MSRA for three months, and as an intern in National University of Singapore, Singapore for one year. His current research interests include social media analysis and multimedia question answering.

Zechao Li is an associate professor in School of Computer Science and Engineering, Nanjing University of Science and Technology, China. He received the PhD degree from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China in 2013. His research interests include large-scale multimedia understanding, social media analysis, etc. He has authored over 60 journal and conference papers in these areas. He received the Young Talent Program of China Association for Science and Technology, the Excellent Doctoral Dissertation of Chinese Academy of Sciences and the Excellent Doctoral Theses of China Computer Federation.

Jinhui Tang is a professor in School of Computer Science and Engineering, Nanjing University of Science and Technology, China. He received his BE and PhD degrees in July 2003 and July 2008 respectively, both from the University of Science and Technology of China, China. From 2008 to 2010, he worked as a research fellow in School of Computing, National University of Singapore, Singapore. His current research interest is in large-scale multimedia search. He has authored over 150 journal and conference papers in the area. Prof. Tang is a co-recipient of the Best Paper Awards in ACM MM 2007, PCM 2011 and ICIMCS 2011, and the Best Student Paper Award in MMM 2016. He is an awardee of the NSFC Excellent Young Scholars Program in 2015.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Li, Z. & Tang, J. Visual understanding by mining social media: recent advances and challenges. Front. Comput. Sci. 12, 406–422 (2018). https://doi.org/10.1007/s11704-017-6377-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-017-6377-1

Keywords

Navigation