Skip to main content
Log in

Manifold Learning Based Cross-media Retrieval: A Solution to Media Object Complementary Nature

  • Published:
The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology Aims and scope Submit manuscript

Abstract

Media objects of different modalities always exist jointly and they are naturally complementary of each other, either in the view of semantics or in the view of modality. In this paper, we propose a manifold learning based cross-media retrieval approach that gives solutions to the two intrinsically basic but crucial questions of media objects semantics understanding and cross-media retrieval. First, considering the semantic complementary, how can we represent the concurrent media objects and fuse the complementary information they carry to understand the integrated semantics precisely. Second, considering the modality complementary, how can we accomplish the modality bridge to establish the cross-index and facilitate the cross-media retrieval? To solve the two problems, we first construct a Multimedia Document (MMD) Semi-Semantic Graph (MMDSSG) and then adopt Multidimensional Scaling to create an MMD Semantic Space (MMDSS). Both long-term and short-term feedbacks are proposed to boost the system performance. The first one is used to refine the MMDSSG and the second one is adopted to introduce new items that are not in the training set into the MMDSS. Since all of the MMDs and their component media objects of different modalities lie in the MMDSS and they are indexed uniformly by their coordinates in the MMDSS regardless of their modalities, the semantic subspace is actually a bridge of media objects which are of different modalities and the cross-media retrieval can be easily achieved. Experiment results are encouraging and indicate that the proposed approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H. J. Zhang and D. Zhong, “Schema for Visual Feature Based Image Retrieval [A],” in Proc. of Storage and Retrieval for Image and Video Database, USA, 1995. pp. 36–46.

  2. J. Z. Wang, G. Wiederhold, O. Firschein, and S. X. Wei, “Content-based Image Indexing and Searching using Daubechies’ Wavelets,” Int. J. Digit. Libr., vol. 1, 1997, pp. 311–328.

    Article  Google Scholar 

  3. E. Chang, K. Goh, G. Sychay, and G. Wu, “CBSA: Content-based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machine,” IEEE Trans on Circuits and Systems for Video Technology, vol. 13, no. 1, 2003. (Jan.)

  4. X. He, W. Y Ma, and H. J. Zhang, “Learning an Image Manifold for Retrieval,” ACM Multimedia Conference, New York, 2004.

  5. Namunu C Maddage, Changsheng Xu., Mohan S Kankanhalli, and Xi Shao, “Content-based Music Structure Analysis with Applications to Music Semantics Understanding,” ACM Multimedia Conference, New York, 2004.

  6. Guodong Guo and S. Z. Li, “Content-based Audio Classification and Retrieval by Support Vector Machines,” IEEE Trans. Neural Netw., vol. 14, no. 1, 2003, pp. 209–215. (Jan.)

    Article  Google Scholar 

  7. E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based Classification, Search and Retrieval of Audio,” IEEE Multimedia Mag., vol. 3, 1996, pp. 27–36. (July)

    Article  Google Scholar 

  8. S. W. Smoliar and HongJiang Zhang, “Content based Video Indexing and Retrieval,” IEEE Multimed., vol. 1, no. 2, 1994, pp. 62–72. (Summer)

    Article  Google Scholar 

  9. Jianping Fan, A. K. Elmagarmid, Xingquan Zhu, W. G. Aref, and Lide Wu, “ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing,” Multimedia, IEEE Transactions on, vol. 6, no. 1, 2004, pp. 70–86. (Feb.)

    Article  Google Scholar 

  10. M. Y. Wu, C. Y. Chiu, S. P. Chao,S. N. Y, and H. C. Lin, “Content-based Retrieval for Human Motion Data,” 16th IPPR Conference on Computer Vision, Graphics and Image Processing (CVGIP 2003).

  11. Meinard Müller, Tido Röder, Michael Clausen, “Efficient Content-based Retrieval of Motion Capture Data,” Proceedings of ACM SIGGRAPH 2005.

  12. Y. Wang, Z. Liu, and J. Huang, “Multimedia Content Analysis Using Audio and Visual Information”, IEEE Signal Process. Mag., vol. 17, no. 6, 2000, pp. 12–36.

    Article  Google Scholar 

  13. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is Nearest Neighbor,” meaningful? International Conference on Database Theory, 1999, pp. 217–235.

  14. J. Yang, Y. T. Zhuang, and Q. Li, “Search for Multi-modality Data in Digital Libraries,” Proc. of 2nd IEEE Pacific-rim Conference on Multimedia, Beijing, China, 2001, pp. 482–489.

  15. H. S. Seung and D. Lee, “The Manifold Ways of Perception,” Science, vol 290, 2000. (22 December)

  16. J. B. Tenenbaum, V. D. Silva, and J. C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol 290, 2000. (22 December)

  17. Fei Wu, Yi Yang, Yueting Zhuang, and Yunhe Pan, “Understanding Multimedia Document Semantics for Cross-Media Retrieval,” LNCS 3767(PCM 2005), pp. 993–1004.

  18. J. B. Kruskal and M. Wish, “Multidimensional Scaling,” Sage, Beverly Hills, CA, 1977.

    Google Scholar 

  19. Fei Wu, Hong Zhang, and Yueting Zhuang, “Learning Semantic Correlations for Cross Media Retrieval,” The 13th International Conference on Image Processing (ICIP) Atlanta, GA, USA, 2006.

  20. H. Choi and S. Choi (2005), “Kernel Isomap on Noisy Manifold,” in Proc. IEEE Int’l Conf. Development and Learning (ICDL), pp. 208–213, Osaka, Japan, July 19–21, 2005.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhuang, Y., Yang, Y., Wu, F. et al. Manifold Learning Based Cross-media Retrieval: A Solution to Media Object Complementary Nature. J VLSI Sign Process Syst Sign Image Video Technol 46, 153–164 (2007). https://doi.org/10.1007/s11265-006-0020-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-006-0020-y

Keywords

Navigation