Skip to main content

Text Mining in Multimedia

  • Chapter
  • First Online:

Abstract

A large amount of multimedia data (e.g., image and video) is now available on the Web. A multimedia entity does not appear in isolation, but is accompanied by various forms of metadata, such as surrounding text, user tags, ratings, and comments etc. Mining these textual metadata has been found to be effective in facilitating multimedia information processing and management. A wealth of research efforts has been dedicated to text mining in multimedia. This chapter provides a comprehensive survey of recent research efforts. Specifically, the survey focuses on four aspects: (a) surrounding text mining; (b) tag mining; (c) joint text and visual content mining; and (d) cross text and visual content mining. Furthermore, open research issues are identified based on the current research efforts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altavistas a/v photo finder. http://www.altavista.com/sites/search/simage.

    Google Scholar 

  2. C. C. Aggarwal, H. Wang. Text Mining in Social Networks. Social Network Data Analytics, Springer, 2011.

    Google Scholar 

  3. D. Cai, X. He, Z. Li, W.-Y. Ma, and J.-R. Wen. Hierarchical clustering of www image search results using visual, textual and link information. In Proceedings of the ACM Conference on Multimedia, 2004.

    Google Scholar 

  4. S.-F. Chang, W. Hsu, W. Jiang, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky. Columbia university trecvid-2006 video search and high-level feature extraction. In Proceedings of NIST TRECVID workshop, 2006.

    Google Scholar 

  5. L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. In Proceedings of the ACM conference on Information and knowledge management, pages 523532. ACM, 2009.

    Google Scholar 

  6. L. Chen, D. Xu, I. W. Tsang, and J. Luo. Tag-based web photo retrieval improved by batch mode re-tagging. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.

    Google Scholar 

  7. W. Dai, Y. Chen, G.-R. Xue, Q. Yang, and Y. Yu. Translated learning: Transfer learning across difference feature spaces. In NIPS, pages 353360, 2008.

    Google Scholar 

  8. J. Fan, Y. Shen, N. Zhou, and Y. Gao. Harvesting large-scaleweaklytagged image databases from the web. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.

    Google Scholar 

  9. H. Feng, R. Shi, and T.-S. Chua. A bootstrapping framework for annotating and retrieving www images. In Proceedings of the ACM Conference on Multimedia, 2004.

    Google Scholar 

  10. S. Feng, C. Lang, and D. Xu. Beyond tag relevance: integrating visual attention model and multi-instance learning for tag saliency ranking. In Proceedings of International Conference on Image and Video Retrieval, 2010.

    Google Scholar 

  11. R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images. In Proceedings of the European Conference on Computer Vision, 2004.

    Google Scholar 

  12. C. Frankel, M. J. Swain, and V. Athitsos. Webseer: An image search engine for the world wide web. Technical report, University of Chicago, Computer Science Department, 1996.

    Google Scholar 

  13. B. Gao, T.-Y. Liu, Q. Tao, X. Zheng, Q. Cheng, and W.-Y. Ma. Web image clustering by consistent utilization of visual features and surrounding texts. In Proceedings of the ACM Conference on Multimedia, 2005.

    Google Scholar 

  14. B. Geng, L. Yang, C. Xu, and X.-S. Hua. Content-aware ranking for visual search. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2010.

    Google Scholar 

  15. G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.

    Google Scholar 

  16. W. Hsu, L. Kennedy,, and S.-F. Chang. Reranking methods for visual search. IEEE Multimedia, 14:1422, 2007.

    Article  Google Scholar 

  17. F. Jing and S. Baluja. Visualrank: Applying pagerank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:18771890, 2008.

    Article  Google Scholar 

  18. F. Jing, M. Li, H.-J. Zhang, and B. Zhang. A unified framework for image retrieval using keyword and visual features. IEEE Transactions on Image Processing, 2005.

    Google Scholar 

  19. F. Jing, C. Wang, Y. Yao, K. Deng, L. Zhang, and W.-Y. Ma. Igroup: Web image search results clustering. In Proceedings of the ACM Conference on Multimedia, pages 377384, 2006.

    Google Scholar 

  20. L. S. Kennedy, S. F. Chang, and I. V. Kozintsev. To search or to label? predicting the performance of search-based automatic image classifiers. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2006.

    Google Scholar 

  21. G. Li, M. Wang, Y. T. Zheng, Z.-J. Zha, H. Li, and T.-S. Chua. Shottagger: Tag location for internet videos. In Proceedings of the ACM International Conference on Multimedia Retrieval, 2011.

    Google Scholar 

  22. X. Li, C. G. Snoek, and M. Worring. Learning social tag relevance by neighbor voting. Pattern Recognition Letters, 11(7), 2009.

    Google Scholar 

  23. X. Li, C. G. Snoek, and M. Worring. Unsupervised multi-feature tag relevance learning for social image retrieval. In Proceedings of the International Conference on Image and Video Retrieval, 2010.

    Google Scholar 

  24. D. Liu, X. C. Hua, M. Wang, and H. Zhang. Image retagging. In Proceedings of the ACM Conference on Multimedia, 2010.

    Google Scholar 

  25. D. Liu, X.-S. Hua, L. Yang, M.Wang, and H.-J. Zhang. Tag ranking. In Proceedings of the International Conference on World Wide Web, 2009.

    Google Scholar 

  26. D. Liu, X.-S. Hua, and H.-J. Zhang. Content-based tag processing for internet social images. Multimedia Tools and Application, 51:723738, 2010.

    Article  Google Scholar 

  27. D. Liu, S. Yan, Y. Rui, and H. J. Zhang. Unified tag analysis with multi-edge graph. In Proceedings of the ACM Conference on Multimedia, 2010.

    Google Scholar 

  28. X. Liu, B. Cheng, S. Yan, J. Tang, T. C. Chua, and H. Jin. Label to region by bi-layer sparsify priors. In Proceedings of the ACM Conference on Multimedia, 2009.

    Google Scholar 

  29. X. Liu, S. Yan, J. Luo, J. Tang, Z. Huang, and H. Jin. Nonparametric label-to-region by search. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.

    Google Scholar 

  30. Y. Liu, T. Mei, and X.-S. Hua. Crowdreranking: Exploring multiple search engines for visual search reranking. In Proceedings of the ACM SIGIR Conference, 2009.

    Google Scholar 

  31. T. Mei, Z.-J. Zha, Y. Liu, M. Wang, and et al. Msra at trecvid 2008: High-level feature extraction and automatic search. In Proceedings of NIST TRECVID workshop, 2008.

    Google Scholar 

  32. S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 2010.

    Google Scholar 

  33. G.-J. Qi, C. C. Aggarwal, and T. Huang. Towards semantic knowledge propagation from text corpus to web images. In Proceedings of the International Conference on World Wide Web, 2011.

    Google Scholar 

  34. M. Rege, M. Dong, and J. Hua. Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In Proceedings of the International Conference on World Wide Web, 2008.

    Google Scholar 

  35. F. Schroff, A. Criminisi, and A. Zisserman. Harvesting images databases from the web. In Proceedings of the International Conference on Computer Vision, 2007.

    Google Scholar 

  36. D. A. Shamma, R. Shaw, P. L. Shafton, and Y. Liu. Watch what i watch: using community activeity to understand content. In Proceedings of the ACM Workshop on Multimedia Information Retrieval, 2007.

    Google Scholar 

  37. X. Shi, Q. Liu, W. Fan, P. S. Yu, and R. Zhu. Transfer learning on heterogenous feature spaces via spectral tranformation. In Proceedings of the International Conference on Data Mining, 2010.

    Google Scholar 

  38. B. Sigurbj¨ornsson and R. V. Zwol. Flickr tag recommendation based on collective knowledge. In Proceedings of International Conference on World Wide Web, 2008.

    Google Scholar 

  39. J. Smith and S.-F. Chang. Visually searching the web for content. IEEE Multimedia, 4:1220, 1995.

    Article  Google Scholar 

  40. R. Srihari. Automatic indexing and content-based retrieval of captioned images. IEEE Computer, 28:4956, 1995.

    Article  Google Scholar 

  41. A. Sun and S. S. Bhowmick. Quantifying tag representativeness of visual content of social images. In Proceedings of the ACM Conference on Multimedia, 2010.

    Google Scholar 

  42. X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. Bayesian video search reranking. In Proceedings of the ACM Conference on Multimedia, 2008.

    Google Scholar 

  43. A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. Identifying relevant frames in weakly labeled videos for training concept detectors. In Proceedings of the International Conference on Image and Video Retrieval, 2008.

    Google Scholar 

  44. G. Wang and D. A. Forsyth. Object image retrieval by exploiting online knowledge resources. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.

    Google Scholar 

  45. J.Wang, Y.-G. Jiang, and S.-F. Chang. Label diagnosis through self tuning for web image search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

    Google Scholar 

  46. M. Wang, X. S. Hua, R. Hong, J. Tang, G. J. Qi, and Y. Song. Unified video annotation via multi-graph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19(5), 2009.

    Google Scholar 

  47. M. Wang, X. S. Hua, J. Tang, and R. Hong. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11(3), 2009.

    Google Scholar 

  48. M. Wang, B. Ni, X.-S. Hua, and T.-S. Chua. Assistive multimedia tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Computing Survey, 2011.

    Google Scholar 

  49. X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. Multi-model similarity propagation and its application for web image retrieval. In Proceedings of the ACM Conference on Multimedia, pages 944951, 2004.

    Google Scholar 

  50. X.-J. Wang, W.-Y. Ma, L. Zhang, and X. Li. Iteratively clustering web images based on link and attribute reinforcements. In Proceedings of the ACM Conference on Multimedia, 2005.

    Google Scholar 

  51. L. Wu, X.-S. Hua, N. Yu, W.-Y. Ma, and S. Li. Flickr distance. In Proceedings of the ACM Conference on Multimedia, 2008.

    Google Scholar 

  52. H. Xu, J.Wang, X.-S. Hua, and S. Li. Tag refinement by regularized LDA. In Proceedings of the ACM Conference on Multimedia, 2009.

    Google Scholar 

  53. R. Yan and A. G. Hauptmann. Co-retrieval: A boosted reranking approach for video retrieval. In Proceedings of the ACM Conference on Image and Video Retrieval, 2004.

    Google Scholar 

  54. R. Yan, A. G. Hauptmann, and R. Jin. Multimedia search with pseudo-relevance feedback. In Proceedings of the ACM Conference on Image and Video Retrieval, 2003.

    Google Scholar 

  55. K. Yang, X.-S. Hua, M. Wang, and H. C. Zhang. Tagging tags. In Proceedings of the ACM Conference on Multimedia, 2010.

    Google Scholar 

  56. Q. Yang, Y. Chen, G.-R. Xue, W. Dai, and Y. Yu. Heterogeneous transfer learning from image clustering via the social web. In Proceedings of the Joint Conference of the Annual Meeting of the ACL, 2009.

    Google Scholar 

  57. Y.-H. Yang, P. Wu, C. W. Lee, K. H. Lin, W. Hsu, and H. H. Chen. Contextseer: Context search and recommendation at query time for shared consumer photos. In Proceedings of the ACM Conference on Multimedia, 2008.

    Google Scholar 

  58. Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.

    Google Scholar 

  59. Z.-J. Zha, T. Mei, J. Wang, X.-S. Hua, and Z. Wang. Graph-based semi-supervised learning with multiple labels. Journal of Visual Communication and Image Representation, 2009.

    Google Scholar 

  60. Z.-J. Zha, M. Wang, Y.-T. Zheng, Y. Yang, R. Hong, and T.-S. Chua. Interactive video indexing with statistical active learning. IEEE Transactions on Multimedia, 2011.

    Google Scholar 

  61. Z.-J. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. Viusal query suggestion. In Proceedings of the ACM Conference on Multimedia, 2009.

    Google Scholar 

  62. R. Zhang, Z. M. Zhang, M. Li, W.-Y. Ma, and H.-J. Zhang. A probabilistic semantic model for image annotation and multi-modal image retrieval. In Proceedings of the International Conference on Computer Vision, pages 846851, 2005.

    Google Scholar 

  63. R. Zhao and W. I. Grosky. Narrowing the semantic gap - improved text-based web document retireval using visual fetures. IEEE Transactions on Multimedia, 4, 2002.

    Google Scholar 

  64. G. Zhu, S. Yan, and Y. Ma. Image tag refinement towards lowrank, content-tag prior and error sparsity. In Proceedings of the ACM Conference on Multimedia, 2010.

    Google Scholar 

  65. Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G.-R. Xue, Y. Yu, and Q. Yang. Heterogeneous transfer learning for image classification. In Proceedings of the AAAI Conference on Artificial Intelligence, 2011.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng-Jun Zha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Zha, ZJ., Wang, M., Shen, J., Chua, TS. (2012). Text Mining in Multimedia. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3223-4_11

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4614-3222-7

  • Online ISBN: 978-1-4614-3223-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics